Re: [PATCH -next] powerpc/book3e: Fix build error

2022-05-17 Thread Michael Ellerman
YueHaibing  writes:
> On 2022/5/17 18:45, Christophe Leroy wrote:
>> Le 17/05/2022 à 11:48, YueHaibing a écrit :
>>> arch/powerpc/mm/nohash/fsl_book3e.c: In function ‘relocate_init’:
>>> arch/powerpc/mm/nohash/fsl_book3e.c:348:2: error: implicit declaration of 
>>> function ‘early_get_first_memblock_info’ 
>>> [-Werror=implicit-function-declaration]
>>>early_get_first_memblock_info(__va(dt_ptr), );
>>>^
>>>
>>> Add missing include file linux/of_fdt.h to fix this.
>>>
>>> Signed-off-by: YueHaibing 
>> 
>> Thats for fixing that.
>> 
>> Reviewed-by: Christophe Leroy 
>> 
>> It means we don't have any defconfig for 32 bits booke with 
>> CONFIG_RELOCATABLE ?
>
> Indeed, there is no defconfig with CONFIG_RELOCATABLE under 
> arch/powerpc/configs

It's selected by CRASH_DUMP, which is in ppc64_defconfig.

But it's not enabled in corenet32_smp_defconfig which is what I build,
or any of the other 85xx configs.

I guess it should be, I think it's true that RELOCATABLE=y exercises
more interesting code paths?

cheers


Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]

2022-05-17 Thread Michael Ellerman
"Eric W. Biederman"  writes:
> Looking at this the pr_err is absolutely needed.  If an unsupported case
> winds up in the purgatory blob and the code can't handle it things
> will fail silently much worse later.

It won't fail later, it will fail the syscall.

sys_kexec_file_load()
  kimage_file_alloc_init()
kimage_file_prepare_segments()
  arch_kexec_kernel_image_load()
kexec_image_load_default()
  image->fops->load()
elf64_load()# powerpc
bzImage64_load()# x86
  kexec_load_purgatory()
kexec_apply_relocations()

Which does:

if (relsec->sh_type == SHT_RELA)
ret = arch_kexec_apply_relocations_add(pi, section,
   relsec, symtab);
else if (relsec->sh_type == SHT_REL)
ret = arch_kexec_apply_relocations(pi, section,
   relsec, symtab);
if (ret)
return ret;

And that error is bubbled all the way back up. So as long as
arch_kexec_apply_relocations() returns an error the syscall will fail
back to userspace and there'll be an error message at that level.

It's true that having nothing printed in dmesg makes it harder to work
out why the syscall failed. But it's a kernel bug if there are unhandled
relocations in the kernel-supplied purgatory code, so a user really has
no way to do anything about the error even if it is printed.

> "Naveen N. Rao"  writes:
>
>> Baoquan He wrote:
>>> On 04/25/22 at 11:11pm, Naveen N. Rao wrote:
 kexec_load_purgatory() can fail for many reasons - there is no need to
 print an error when encountering unsupported relocations.
 This solves a build issue on powerpc with binutils v2.36 and newer [1].
 Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
 symbols") [2], binutils started dropping section symbols that it thought
>>> I am not familiar with binutils, while wondering if this exists in other
>>> ARCHes except of ppc. Arm64 doesn't have the ARCH override either, do we
>>> have problem with it?
>>
>> I'm not aware of this specific file causing a problem on other architectures 
>> -
>> perhaps the config options differ enough. There are however more reports of
>> similar issues affecting other architectures with the llvm integrated 
>> assembler:
>> https://github.com/ClangBuiltLinux/linux/issues/981
>>
>>>
 were unused.  This isn't an issue in general, but with kexec_file.c, gcc
 is placing kexec_arch_apply_relocations[_add] into a separate
 .text.unlikely section and the section symbol ".text.unlikely" is being
 dropped. Due to this, recordmcount is unable to find a non-weak symbol
>>> But arch_kexec_apply_relocations_add is weak symbol on ppc.
>>
>> Yes. Note that it is just the section symbol that gets dropped. The section 
>> is
>> still present and will continue to hold the symbols for the functions
>> themselves.
>
> So we have a case where binutils thinks it is doing something useful
> and our kernel specific tool gets tripped up by it.

It's not just binutils, the LLVM assembler has the same behavior.

> Reading the recordmcount code it looks like it is finding any symbol
> within a section but ignoring weak symbols.  So I suspect the only
> remaining symbol in the section is __weak and that confuses
> recordmcount.
>
> Does removing the __weak annotation on those functions fix the build
> error?  If so we can restructure the kexec code to simply not use __weak
> symbols.
>
> Otherwise the fix needs to be in recordmcount or binutils, and we should
> loop whoever maintains recordmcount in to see what they can do.

It seems that recordmcount is not really maintained anymore now that x86
uses objtool?

There've been several threads about fixing recordmcount, but none of
them seem to have lead to a solution.

These weak symbol vs recordmcount problems have been worked around going
back as far as 2020:

  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/elfcore.h?id=6e7b64b9dd6d96537d816ea07ec26b7dedd397b9

cheers


[powerpc:topic/ppc-kvm] BUILD SUCCESS 2852ebfa10afdcefff35ec72c8da97141df9845c

2022-05-17 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
topic/ppc-kvm
branch HEAD: 2852ebfa10afdcefff35ec72c8da97141df9845c  KVM: PPC: Book3S HV 
Nested: L2 LPCR should inherit L1 LPES setting

elapsed time: 724m

configs tested: 119
configs skipped: 104

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm64   defconfig
arm64allyesconfig
arm  allmodconfig
arm defconfig
arm  allyesconfig
i386 randconfig-c001-20220516
m68km5407c3_defconfig
m68k allmodconfig
sparc   sparc64_defconfig
mipsar7_defconfig
s390 allyesconfig
sh   se7712_defconfig
arm  pxa3xx_defconfig
sh   se7206_defconfig
sh   sh7724_generic_defconfig
m68k   m5249evb_defconfig
arm   h5000_defconfig
sh   se7724_defconfig
powerpc mpc85xx_cds_defconfig
riscv nommu_k210_sdcard_defconfig
m68k   m5208evb_defconfig
sh sh03_defconfig
shecovec24-romimage_defconfig
sh  r7780mp_defconfig
powerpc   ppc64_defconfig
arcnsimosci_defconfig
powerpc pq2fads_defconfig
sh ecovec24_defconfig
mipsgpr_defconfig
m68k   bvme6000_defconfig
arm  randconfig-c002-20220516
x86_64   randconfig-c001-20220516
ia64defconfig
m68k allyesconfig
m68kdefconfig
cskydefconfig
nios2allyesconfig
alpha   defconfig
alphaallyesconfig
nios2   defconfig
arc  allyesconfig
h8300allyesconfig
xtensa   allyesconfig
arc defconfig
sh   allmodconfig
s390defconfig
s390 allmodconfig
parisc  defconfig
parisc64defconfig
parisc   allyesconfig
sparc   defconfig
i386 allyesconfig
sparcallyesconfig
i386defconfig
i386   debian-10.3-kselftests
i386  debian-10.3
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc   allnoconfig
powerpc  allmodconfig
x86_64   randconfig-a012-20220516
x86_64   randconfig-a016-20220516
x86_64   randconfig-a011-20220516
x86_64   randconfig-a014-20220516
x86_64   randconfig-a013-20220516
x86_64   randconfig-a015-20220516
i386 randconfig-a016-20220516
i386 randconfig-a013-20220516
i386 randconfig-a015-20220516
i386 randconfig-a012-20220516
i386 randconfig-a014-20220516
i386 randconfig-a011-20220516
i386  randconfig-a012
i386  randconfig-a014
i386  randconfig-a016
s390 randconfig-r044-20220516
riscvrandconfig-r042-20220516
arc  randconfig-r043-20220516
riscv   defconfig
riscvnommu_virt_defconfig
riscv  rv32_defconfig
riscvnommu_k210_defconfig
riscv allnoconfig
riscvallmodconfig
riscvallyesconfig
x86_64rhel-8.3-kselftests
um   x86_64_defconfig
um i386_defconfig
x86_64  rhel-8.3-func
x86_64   rhel-8.3-syz
x86_64  kexec
x86_64  defconfig
x86_64   allyesconfig
x86_64 rhel-8.3-kunit
x86_64   rhel-8.3

clang tested configs:
powerpc  randconfig-c003-20220516
riscvrandconfig-c006-20220516
mips randconfig-c004-20220516
arm  

Re: [PATCH v4] locking/csd_lock: change csdlock_debug from early_param to __setup

2022-05-17 Thread Paul E. McKenney
On Tue, May 17, 2022 at 11:22:04AM +0800, Chen Zhongjin wrote:
> On 2022/5/10 17:46, Chen Zhongjin wrote:
> > csdlock_debug uses early_param and static_branch_enable() to enable
> > csd_lock_wait feature, which triggers a panic on arm64 with config:
> > CONFIG_SPARSEMEM=y
> > CONFIG_SPARSEMEM_VMEMMAP=n
> > 
> > With CONFIG_SPARSEMEM_VMEMMAP=n, __nr_to_section is called in
> > static_key_enable() and returns NULL which makes NULL dereference
> > because mem_section is initialized in sparse_init() which is later
> > than parse_early_param() stage.
> > 
> > For powerpc this is also broken, because early_param stage is
> > earlier than jump_label_init() so static_key_enable won't work.
> > powerpc throws an warning: "static key 'xxx' used before call
> > to jump_label_init()".
> > 
> > Thus, early_param is too early for csd_lock_wait to run
> > static_branch_enable(), so changes it to __setup to fix these.
> > 
> > Fixes: 8d0968cc6b8f ("locking/csd_lock: Add boot parameter for controlling 
> > CSD lock debugging")
> > Cc: sta...@vger.kernel.org
> > Reported-by: Chen jingwen 
> > Signed-off-by: Chen Zhongjin 
> > ---
> > Change v3 -> v4:
> > Fix title and description because this fix is also applied
> > to powerpc.
> > For more detailed arm64 bug report see:
> > https://lore.kernel.org/linux-arm-kernel/e8715911-f835-059d-27f8-cc5f5ad30...@huawei.com/t/
> > 
> > Change v2 -> v3:
> > Add module name in title
> > 
> > Change v1 -> v2:
> > Fix return 1 for __setup
> > ---
> >  kernel/smp.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/smp.c b/kernel/smp.c
> > index 65a630f62363..381eb15cd28f 100644
> > --- a/kernel/smp.c
> > +++ b/kernel/smp.c
> > @@ -174,9 +174,9 @@ static int __init csdlock_debug(char *str)
> > if (val)
> > static_branch_enable(_debug_enabled);
> >  
> > -   return 0;
> > +   return 1;
> >  }
> > -early_param("csdlock_debug", csdlock_debug);
> > +__setup("csdlock_debug=", csdlock_debug);
> >  
> >  static DEFINE_PER_CPU(call_single_data_t *, cur_csd);
> >  static DEFINE_PER_CPU(smp_call_func_t, cur_csd_func);
> 
> Ping for review. Thanks!

I have pulled it into -rcu for testing and further review.  It might
well need to go through some other path, though.

Thanx, Paul


Re: [PATCH] net: unexport csum_and_copy_{from,to}_user

2022-05-17 Thread Michael Ellerman
Al Viro  writes:
> On Thu, Apr 21, 2022 at 09:04:40AM +0200, Christoph Hellwig wrote:
>> csum_and_copy_from_user and csum_and_copy_to_user are exported by
>> a few architectures, but not actually used in modular code.  Drop
>> the exports.
>> 
>> Signed-off-by: Christoph Hellwig 
>
> Acked-by: Al Viro 
>
> Not sure which tree should it go through - Arnd's, perhaps?

It's already in akpm's tree:

https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-nonmm-stable=6308499b5e99c0c903fde2c605e41d9a86c4be6c

cheers


RE: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list

2022-05-17 Thread Luck, Tony
> What I'm planning to do in the altera_edac notifier is:
>
> if (kdump_is_set)
>   return;

Yes. That's what I think should happen.

-Tony


Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list

2022-05-17 Thread Guilherme G. Piccoli
On 17/05/2022 14:02, Luck, Tony wrote:
>> Tony / Dinh - can I just *skip* this notifier *if kdump* is set or else
>> we run the code as-is? Does that make sense to you?
> 
> The "skip" option sounds like it needs some special flag associated with
> an entry on the notifier chain. But there are other notifier chains ... so 
> that
> sounds messy to me.
> 
> Just all the notifiers in priority order. If any want to take different 
> actions
> based on kdump status, change the code. That seems more flexible than
> an "all or nothing" approach by skipping.
> 
> -Tony

I guess I've expressed myself in a poor way - sorry!

What I'm planning to do in the altera_edac notifier is:

if (kdump_is_set)
 return;

/* regular code */

In other words: if the kdump is set, this notifier will be effectively a
nop (although it's gonna be called).

Lemme know your thoughts Tony, if that makes sense.
Thanks,


Guilherme


RE: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list

2022-05-17 Thread Luck, Tony
> Tony / Dinh - can I just *skip* this notifier *if kdump* is set or else
> we run the code as-is? Does that make sense to you?

The "skip" option sounds like it needs some special flag associated with
an entry on the notifier chain. But there are other notifier chains ... so that
sounds messy to me.

Just all the notifiers in priority order. If any want to take different actions
based on kdump status, change the code. That seems more flexible than
an "all or nothing" approach by skipping.

-Tony


Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list

2022-05-17 Thread Guilherme G. Piccoli
On 17/05/2022 11:11, Petr Mladek wrote:
> [...]
>>> Then notifiers could make an informed choice on whether to deep dive to
>>> get all the possible details (when there is no kdump) or just skim the high
>>> level stuff (to maximize chance of getting a successful kdump).
>>>
>>> -Tony
>>
>> Good idea Tony! What if I wire a kexec_crash_loaded() in the notifier?
> 
> I like this idea.
> 
> One small problem is that kexec_crash_loaded() has valid result
> only under kexec_mutex. On the other hand, it should stay true
> once loaded so that the small race window should be innocent.
> 
>> With that, are you/Petr/Dinh OK in moving it for the info list?
> 
> Sounds good to me.
> 
> Best Regards,
> Petr

Perfect, I'll do that for V2 then =)

Tony / Dinh - can I just *skip* this notifier *if kdump* is set or else
we run the code as-is? Does that make sense to you?

I'll postpone it to run almost in the end of info list (last position is
for panic_print).

Thanks,


Guilherme


Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list

2022-05-17 Thread Guilherme G. Piccoli
On 17/05/2022 10:57, Petr Mladek wrote:
> [...]
 --- a/drivers/misc/bcm-vk/bcm_vk_dev.c
 +++ b/drivers/misc/bcm-vk/bcm_vk_dev.c
 @@ -1446,7 +1446,7 @@ static int bcm_vk_probe(struct pci_dev *pdev, const 
 struct pci_device_id *ent)
>>> [... snip ...]
>>> It seems to reset some hardware or so. IMHO, it should go into the
>>> pre-reboot list.
>>
>> Mixed feelings here, I'm looping Broadcom maintainers to comment.
>> (CC Scott and Broadcom list)
>>
>> I'm afraid it breaks kdump if this device is not reset beforehand - it's
>> a doorbell write, so not high risk I think...
>>
>> But in case the not-reset device can be probed normally in kdump kernel,
>> then I'm fine in moving this to the reboot list! I don't have the HW to
>> test myself.
> 
> Good question. Well, it if has to be called before kdump then
> even "hypervisor" list is a wrong place because is not always
> called before kdump.

Agreed! I'll defer that to Scott and Broadcom folks to comment.
If it's not strictly necessary, I'll happily move it to the reboot list.

If necessary, we could use the machine_crash_kexec() approach, but we'll
fall into the case arm64 doesn't support it and I'm not sure if this
device is available for arm - again a question for the maintainers.


>  [...]
 --- a/drivers/power/reset/ltc2952-poweroff.c
 +++ b/drivers/power/reset/ltc2952-poweroff.c
>> [...]
>> This is setting a variable only, and once it's set (data->kernel_panic
>> is the bool's name), it just bails out the IRQ handler and a timer
>> setting - this timer seems kinda tricky, so bailing out ASAP makes sense
>> IMHO.
> 
> IMHO, the timer informs the hardware that the system is still alive
> in the middle of panic(). If the timer is not working then the
> hardware (chip) will think that the system frozen in panic()
> and will power off the system. See the comments in
> drivers/power/reset/ltc2952-poweroff.c:
> [ snip ...]
> IMHO, we really have to keep it alive until we reach the reboot stage.
> 
> Another question is how it actually works when the interrupts are
> disabled during panic() and the timer callbacks are not handled.

Agreed here! Guess I can move this one the reboot list, fine by me.
Unless PM folks think otherwise.


> [...]
>> Disagree here, I'm CCing Florian for information.
>>
>> This notifier preserves RAM so it's *very interesting* if we have
>> kmsg_dump() for example, but maybe might be also relevant in case kdump
>> kernel is configured to store something in a persistent RAM (then,
>> without this notifier, after kdump reboots the system data would be lost).
> 
> I see. It is actually similar problem as with
> drivers/firmware/google/gsmi.c.
> 
> I does similar things like kmsg_dump() so it should be called in
> the same location (after info notifier list and before kdump).
> 
> A solution might be to put it at these notifiers at the very
> end of the "info" list or make extra "dump" notifier list.

Here I still disagree. I've commented in the other response thread
(about Google gsmi) about the semantics of the hypervisor list, but
again: this list should contain callbacks that

(a) Should run early, _by default_ before a kdump;
(b) Communicate with the firmware/hypervisor in a "low-risk" way;

Imagine a scenario where users configure kdump kernel to save something
in a persistent form in DRAM - it'd be like a late pstore, in the next
kernel. This callback enables that, it's meant to inform FW "hey, panic
happened, please from now on don't clear the RAM in the next FW-reboot".
I don't see a reason to postpone that - let's see if the maintainers
have an opinion.

Cheers,


Guilherme


Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list

2022-05-17 Thread Guilherme G. Piccoli
On 17/05/2022 10:28, Petr Mladek wrote:
> [...]
>>> Disagree here. I'm looping Google maintainers, so they can comment.
>>> (CCed Evan, David, Julius)
>>>
>>> This notifier is clearly a hypervisor notification mechanism. I've fixed
>>> a locking stuff there (in previous patch), I feel it's low-risk but even
>>> if it's mid-risk, the class of such callback remains a perfect fit with
>>> the hypervisor list IMHO.
>>
>> This logs a panic to our "eventlog", a tiny logging area in SPI flash
>> for critical and power-related events. In some cases this ends up
>> being the only clue we get in a Chromebook feedback report that a
>> panic occurred, so from my perspective moving it to the front of the
>> line seems like a good idea.
> 
> IMHO, this would really better fit into the pre-reboot notifier list:
> 
>+ the callback stores the log so it is similar to kmsg_dump()
>  or console_flush_on_panic()
> 
>+ the callback should be proceed after "info" notifiers
>  that might add some other useful information.
> 
> Honestly, I am not sure what exactly hypervisor callbacks do. But I
> think that they do not try to extract the kernel log because they
> would need to handle the internal format.
> 

I guess the main point in your response is : "I am not sure what exactly
hypervisor callbacks do". We need to be sure about the semantics of such
list, and agree on that.

So, my opinion about this first list, that we call "hypervisor list",
is: it contains callbacks that

(1) should run early, preferably before kdump (or even if kdump isn't
set, should run ASAP);

(2) these callbacks perform some communication with an abstraction that
runs "below" the kernel, like a firmware or hypervisor. Classic example:
pvpanic, that communicates with VMM (usually qemu) and allow such VMM to
snapshot the full guest memory, for example.

(3) Should be low-risk. What defines risk is the level of reliability of
subsequent operations - if the callback have 50% of chance of "bricking"
the system totally and prevent kdump / kmsg_dump() / reboot , this is
high risk one for example.

Some good fits IMO: pvpanic, sstate_panic_event() [sparc], fadump in
powerpc, etc.

So, this is a good case for the Google notifier as well - it's not
collecting data like the dmesg (hence your second bullet seems to not
apply here, info notifiers won't add info to be collected by gsmi). It
is a firmware/hypervisor/whatever-gsmi-is notification mechanism, that
tells such "lower" abstraction a panic occurred. It seems low risk and
we want it to run ASAP, if possible.

So, I'd like to keep it here, unless gsmi maintainers disagree or I'm
perhaps misunderstanding the meaning of this first list.
Cheers,


Guilherme


Re: [PATCH 05/30] misc/pvpanic: Convert regular spinlock into trylock on panic path

2022-05-17 Thread Guilherme G. Piccoli
On 17/05/2022 07:58, Petr Mladek wrote:
> [...]
>> Thanks for the review Petr. Patch was already merged - my goal was to be
>> concise, i.e., a patch per driver / module, so the patch kinda fixes
>> whatever I think is wrong with the driver with regards panic handling.
>>
>> Do you think it worth to remove this patch from Greg's branch just to
>> split it in 2? Personally I think it's not worth, but opinions are welcome.
> 
> No problem. It is not worth the effort.
> 

OK, perfect!

Cheers,


Guilherme


Re: [PATCH v6 15/29] x86/hpet: Add helper function hpet_set_comparator_periodic()

2022-05-17 Thread Ricardo Neri
On Sat, May 14, 2022 at 10:17:38AM +0200, Thomas Gleixner wrote:
> On Fri, May 13 2022 at 14:19, Ricardo Neri wrote:
> > On Fri, May 06, 2022 at 11:41:13PM +0200, Thomas Gleixner wrote:
> >> The argument about not bloating the code
> >> with an "obvious???" function which is quite small is slightly beyond my
> >> comprehension level.
> >
> > That obvious function would look like this:
> >
> > void hpet_set_comparator_one_shot(int channel, u32 delta)
> > {
> > u32 count;
> >
> > count = hpet_readl(HPET_COUNTER);
> > count += delta;
> > hpet_writel(count, HPET_Tn_CMP(channel));
> > }
> 
> This function only works reliably when the delta is large. See
> hpet_clkevt_set_next_event().

That is a good point. One more reason to not have a
hpet_set_comparator_one_shot(), IMO.

Thanks and BR,
Ricardo


Re: [PATCH v6 28/29] x86/tsc: Restart NMI watchdog after refining tsc_khz

2022-05-17 Thread Ricardo Neri
On Tue, May 10, 2022 at 01:44:05PM +0200, Thomas Gleixner wrote:
> On Tue, May 10 2022 at 21:16, Nicholas Piggin wrote:
> > Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am:
> >> +  /*
> >> +   * If in use, the HPET hardlockup detector relies on tsc_khz.
> >> +   * Reconfigure it to make use of the refined tsc_khz.
> >> +   */
> >> +  lockup_detector_reconfigure();
> >
> > I don't know if the API is conceptually good.
> >
> > You change something that the lockup detector is currently using, 
> > *while* the detector is running asynchronously, and then reconfigure
> > it. What happens in the window? If this code is only used for small
> > adjustments maybe it does not really matter but in principle it's
> > a bad API to export.
> >
> > lockup_detector_reconfigure as an internal API is okay because it
> > reconfigures things while the watchdog is stopped [actually that
> > looks untrue for soft dog which uses watchdog_thresh in
> > is_softlockup(), but that should be fixed].
> >
> > You're the arch so you're allowed to stop the watchdog and configure
> > it, e.g., hardlockup_detector_perf_stop() is called in arch/.
> >
> > So you want to disable HPET watchdog if it was enabled, then update
> > wherever you're using tsc_khz, then re-enable.
> 
> The real question is whether making this refined tsc_khz value
> immediately effective matters at all. IMO, it does not because up to
> that point the watchdog was happily using the coarse calibrated value
> and the whole use TSC to assess whether the HPET fired mechanism is just
> a guestimate anyway. So what's the point of trying to guess 'more
> correct'.

In some of my test systems I observed that, the TSC value does not fall
within the expected error window the first time the HPET channel expires.

I inferred that the error computed using the coarser tsc_khz was wrong.
Recalculating the error window with refined tsc_khz would correct it.

However, restarting the timer has the side-effect of kicking the timer and,
therefore pushing the first HPET NMI further in the future.

Perhaps kicking HPET channel, not recomputing the error window, corrected
(masked?) the problem.

I will investigate further and rework or drop this patch as needed.

Thanks and BR,
Ricardo


Re: [PATCH v3] PCI/AER: Handle Multi UnCorrectable/Correctable errors properly

2022-05-17 Thread Bjorn Helgaas
On Wed, May 11, 2022 at 05:29:45PM -0700, Sathyanarayanan Kuppuswamy wrote:
> 
> 
> On 5/11/22 4:40 PM, Bjorn Helgaas wrote:
> > On Mon, Apr 18, 2022 at 03:02:37PM +, Kuppuswamy Sathyanarayanan wrote:
> > > Currently the aer_irq() handler returns IRQ_NONE for cases without bits
> > > PCI_ERR_ROOT_UNCOR_RCV or PCI_ERR_ROOT_COR_RCV are set. But this
> > > assumption is incorrect.
> > > 
> > > Consider a scenario where aer_irq() is triggered for a correctable
> > > error, and while we process the error and before we clear the error
> > > status in "Root Error Status" register, if the same kind of error
> > > is triggered again, since aer_irq() only clears events it saw, the
> > > multi-bit error is left in tact. This will cause the interrupt to fire
> > > again, resulting in entering aer_irq() with just the multi-bit error
> > > logged in the "Root Error Status" register.
> > > 
> > > Repeated AER recovery test has revealed this condition does happen
> > > and this prevents any new interrupt from being triggered. Allow to
> > > process interrupt even if only multi-correctable (BIT 1) or
> > > multi-uncorrectable bit (BIT 3) is set.
> > > 
> > > Also note that, for cases with only multi-bit error is set, since this
> > > is not the first occurrence of the error, PCI_ERR_ROOT_ERR_SRC may have
> > > zero or some junk value. So we cannot cleanly process this error
> > > information using aer_isr_one_error(). All we are attempting with this
> > > fix is to make sure error interrupt processing can continue in this
> > > scenario.
> > > 
> > > This error can be reproduced by making following changes to the
> > > aer_irq() function and by executing the given test commands.
> > > 
> > >   static irqreturn_t aer_irq(int irq, void *context)
> > >   struct aer_err_source e_src = {};
> > > 
> > >   pci_read_config_dword(rp, aer + PCI_ERR_ROOT_STATUS,
> > >   _src.status);
> > >   +   pci_dbg(pdev->port, "Root Error Status: %04x\n",
> > >   +   e_src.status);
> > >   if (!(e_src.status & AER_ERR_STATUS_MASK))
> > 
> > Do you mean
> > 
> >if (!(e_src.status & (PCI_ERR_ROOT_UNCOR_RCV|PCI_ERR_ROOT_COR_RCV)))
> > 
> > here?  AER_ERR_STATUS_MASK would be after this fix.
> 
> Yes. You are correct. Do you want me to update it and Fixes tag
> and send next version?

I moved the repro details to a bugzilla, updated the commit log as
below, and applied to pci/error for v5.19, thanks!


commit 203926da2bff ("PCI/AER: Clear MULTI_ERR_COR/UNCOR_RCV bits")
Author: Kuppuswamy Sathyanarayanan 
Date:   Mon Apr 18 15:02:37 2022 +

PCI/AER: Clear MULTI_ERR_COR/UNCOR_RCV bits

When a Root Port or Root Complex Event Collector receives an error Message
e.g., ERR_COR, it sets PCI_ERR_ROOT_COR_RCV in the Root Error Status
register and logs the Requester ID in the Error Source Identification
register.  If it receives a second ERR_COR Message before software clears
PCI_ERR_ROOT_COR_RCV, hardware sets PCI_ERR_ROOT_MULTI_COR_RCV and the
Requester ID is lost.

In the following scenario, PCI_ERR_ROOT_MULTI_COR_RCV was never cleared:

  - hardware receives ERR_COR message
  - hardware sets PCI_ERR_ROOT_COR_RCV
  - aer_irq() entered
  - aer_irq(): status = pci_read_config_dword(PCI_ERR_ROOT_STATUS)
  - aer_irq(): now status == PCI_ERR_ROOT_COR_RCV
  - hardware receives second ERR_COR message
  - hardware sets PCI_ERR_ROOT_MULTI_COR_RCV
  - aer_irq(): pci_write_config_dword(PCI_ERR_ROOT_STATUS, status)
  - PCI_ERR_ROOT_COR_RCV is cleared; PCI_ERR_ROOT_MULTI_COR_RCV is set
  - aer_irq() entered again
  - aer_irq(): status = pci_read_config_dword(PCI_ERR_ROOT_STATUS)
  - aer_irq(): now status == PCI_ERR_ROOT_MULTI_COR_RCV
  - aer_irq() exits because PCI_ERR_ROOT_COR_RCV not set
  - PCI_ERR_ROOT_MULTI_COR_RCV is still set

The same problem occurred with ERR_NONFATAL/ERR_FATAL Messages and
PCI_ERR_ROOT_UNCOR_RCV and PCI_ERR_ROOT_MULTI_UNCOR_RCV.

Fix the problem by queueing an AER event and clearing the Root Error Status
bits when any of these bits are set:

  PCI_ERR_ROOT_COR_RCV
  PCI_ERR_ROOT_UNCOR_RCV
  PCI_ERR_ROOT_MULTI_COR_RCV
  PCI_ERR_ROOT_MULTI_UNCOR_RCV

See the bugzilla link for details from Eric about how to reproduce this
problem.

[bhelgaas: commit log, move repro details to bugzilla]
Fixes: e167bfcaa4cd ("PCI: aerdrv: remove magical ROOT_ERR_STATUS_MASKS")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215992
Link: 
https://lore.kernel.org/r/20220418150237.1021519-1-sathyanarayanan.kuppusw...@linux.intel.com
Reported-by: Eric Badger 
Signed-off-by: Kuppuswamy Sathyanarayanan 

Signed-off-by: Bjorn Helgaas 
Reviewed-by: Ashok Raj 


Re: [PATCH] net: unexport csum_and_copy_{from,to}_user

2022-05-17 Thread Al Viro
On Thu, Apr 21, 2022 at 09:04:40AM +0200, Christoph Hellwig wrote:
> csum_and_copy_from_user and csum_and_copy_to_user are exported by
> a few architectures, but not actually used in modular code.  Drop
> the exports.
> 
> Signed-off-by: Christoph Hellwig 

Acked-by: Al Viro 

Not sure which tree should it go through - Arnd's, perhaps?


Re: [PATCH v6 28/29] x86/tsc: Restart NMI watchdog after refining tsc_khz

2022-05-17 Thread Ricardo Neri
On Tue, May 10, 2022 at 09:16:21PM +1000, Nicholas Piggin wrote:
> Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am:
> > The HPET hardlockup detector relies on tsc_khz to estimate the value of
> > that the TSC will have when its HPET channel fires. A refined tsc_khz
> > helps to estimate better the expected TSC value.
> > 
> > Using the early value of tsc_khz may lead to a large error in the expected
> > TSC value. Restarting the NMI watchdog detector has the effect of kicking
> > its HPET channel and make use of the refined tsc_khz.
> > 
> > When the HPET hardlockup is not in use, restarting the NMI watchdog is
> > a noop.
> > 
> > Cc: Andi Kleen 
> > Cc: Stephane Eranian 
> > Cc: "Ravi V. Shankar" 
> > Cc: io...@lists.linux-foundation.org
> > Cc: linuxppc-dev@lists.ozlabs.org
> > Cc: x...@kernel.org
> > Signed-off-by: Ricardo Neri 
> > ---
> > Changes since v5:
> >  * Introduced this patch
> > 
> > Changes since v4
> >  * N/A
> > 
> > Changes since v3
> >  * N/A
> > 
> > Changes since v2:
> >  * N/A
> > 
> > Changes since v1:
> >  * N/A
> > ---
> >  arch/x86/kernel/tsc.c | 6 ++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> > index cafacb2e58cc..cc1843044d88 100644
> > --- a/arch/x86/kernel/tsc.c
> > +++ b/arch/x86/kernel/tsc.c
> > @@ -1386,6 +1386,12 @@ static void tsc_refine_calibration_work(struct 
> > work_struct *work)
> > /* Inform the TSC deadline clockevent devices about the recalibration */
> > lapic_update_tsc_freq();
> >  
> > +   /*
> > +* If in use, the HPET hardlockup detector relies on tsc_khz.
> > +* Reconfigure it to make use of the refined tsc_khz.
> > +*/
> > +   lockup_detector_reconfigure();
> 
> I don't know if the API is conceptually good.
> 
> You change something that the lockup detector is currently using, 
> *while* the detector is running asynchronously, and then reconfigure
> it. 

Yes, this is what happens.

> What happens in the window? If this code is only used for small
> adjustments maybe it does not really matter

Please see my comment

> but in principle it's a bad API to export.
> 
> lockup_detector_reconfigure as an internal API is okay because it
> reconfigures things while the watchdog is stopped

I see.

> [actually that  looks untrue for soft dog which uses watchdog_thresh in
> is_softlockup(), but that should be fixed].

Perhaps there should be a watchdog_thresh_user. When the user updates it,
the detector is stopped, watchdog_thresh is updated, and then the detector
is resumed.

> 
> You're the arch so you're allowed to stop the watchdog and configure
> it, e.g., hardlockup_detector_perf_stop() is called in arch/.

I had it like this but it did not look right to me. You are right, however,
I can stop/restart the watchdog when needed.

Thanks and BR,
Ricardo


[powerpc:next-test 139/145] arch/powerpc/sysdev/mpc5xxx_clocks.c:26:2: error: call to undeclared function 'fwnode_for_each_parent_node'; ISO C99 and later do not support implicit function declarations

2022-05-17 Thread kernel test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next-test
head:   2e4a9942261f89ad204a8189634029a4b1f0efb6
commit: f9724684ea2198caaab132514ca104175ed2f15f [139/145] powerpc/mpc5xxx: 
Switch mpc5xxx_get_bus_frequency() to use fwnode
config: powerpc-mpc512x_defconfig 
(https://download.01.org/0day-ci/archive/20220518/202205180443.cfi2jikj-...@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 
853fa8ee225edf2d0de94b0dcbd31bea916e825e)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install powerpc cross compiling tool for clang build
# apt-get install binutils-powerpc-linux-gnu
# 
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=f9724684ea2198caaab132514ca104175ed2f15f
git remote add powerpc 
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git
git fetch --no-tags powerpc next-test
git checkout f9724684ea2198caaab132514ca104175ed2f15f
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 
O=build_dir ARCH=powerpc SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

>> arch/powerpc/sysdev/mpc5xxx_clocks.c:26:2: error: call to undeclared 
>> function 'fwnode_for_each_parent_node'; ISO C99 and later do not support 
>> implicit function declarations [-Wimplicit-function-declaration]
   fwnode_for_each_parent_node(fwnode, parent) {
   ^
>> arch/powerpc/sysdev/mpc5xxx_clocks.c:26:45: error: expected ';' after 
>> expression
   fwnode_for_each_parent_node(fwnode, parent) {
  ^
  ;
   2 errors generated.


vim +/fwnode_for_each_parent_node +26 arch/powerpc/sysdev/mpc5xxx_clocks.c

 8  
 9  /**
10   * mpc5xxx_fwnode_get_bus_frequency - Find the bus frequency for a 
firmware node
11   * @fwnode: firmware node
12   *
13   * Returns bus frequency (IPS on MPC512x, IPB on MPC52xx),
14   * or 0 if the bus frequency cannot be found.
15   */
16  unsigned long mpc5xxx_fwnode_get_bus_frequency(struct fwnode_handle 
*fwnode)
17  {
18  struct fwnode_handle *parent;
19  u32 bus_freq;
20  int ret;
21  
22  ret = fwnode_property_read_u32(fwnode, "bus-frequency", 
_freq);
23  if (!ret)
24  return bus_freq;
25  
  > 26  fwnode_for_each_parent_node(fwnode, parent) {

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp


Looking for old boards with Galileo / Marvell Discovery chips on them.

2022-05-17 Thread Steven J. Hill

Hello.

Currently looking for old EV64260 and EV64360 development boards. If you 
have any to sell, please contact me. Tried contacting Mark Greer of list, 
but not sure if he got any of my email. Oh, also looking for a BDI2000 or 
BDI3000 if you have one to sell. Thank you.


-Steve


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH v6 21/29] x86/nmi: Add an NMI_WATCHDOG NMI handler category

2022-05-17 Thread Ricardo Neri
On Mon, May 09, 2022 at 03:59:40PM +0200, Thomas Gleixner wrote:
> On Thu, May 05 2022 at 17:00, Ricardo Neri wrote:
> > Add a NMI_WATCHDOG as a new category of NMI handler. This new category
> > is to be used with the HPET-based hardlockup detector. This detector
> > does not have a direct way of checking if the HPET timer is the source of
> > the NMI. Instead, it indirectly estimates it using the time-stamp counter.
> >
> > Therefore, we may have false-positives in case another NMI occurs within
> > the estimated time window. For this reason, we want the handler of the
> > detector to be called after all the NMI_LOCAL handlers. A simple way
> > of achieving this with a new NMI handler category.
> >
> > @@ -379,6 +385,10 @@ static noinstr void default_do_nmi(struct pt_regs 
> > *regs)
> > }
> > raw_spin_unlock(_reason_lock);
> >  
> > +   handled = nmi_handle(NMI_WATCHDOG, regs);
> > +   if (handled == NMI_HANDLED)
> > +   goto out;
> > +
> 
> How is this supposed to work reliably?
> 
> If perf is active and the HPET NMI and the perf NMI come in around the
> same time, then nmi_handle(LOCAL) can swallow the NMI and the watchdog
> won't be checked. Because MSI is strictly edge and the message is only
> sent once, this can result in a stale watchdog, no?

This is true. Instead, at the end of each NMI I should _also_ check if the TSC
is within the expected value of the HPET NMI watchdog. In this way, unrelated
NMIs (e.g., perf NMI) are handled and we don't miss the NMI from the HPET
channel.

Thanks and BR,
Ricardo


[powerpc:next-test 141/145] arch/powerpc/kernel/trace/ftrace.c:714:6: error: redefinition of 'ftrace_free_init_tramp'

2022-05-17 Thread kernel test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next-test
head:   2e4a9942261f89ad204a8189634029a4b1f0efb6
commit: 0cad223e88229b1f021f9e167bdb2ba7b0d0c369 [141/145] powerpc/ftrace: 
Remove ftrace init tramp once kernel init is complete
config: powerpc-buildonly-randconfig-r005-20220516 
(https://download.01.org/0day-ci/archive/20220518/202205180129.raxtjqf6-...@intel.com/config)
compiler: powerpc-linux-gcc (GCC) 11.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=0cad223e88229b1f021f9e167bdb2ba7b0d0c369
git remote add powerpc 
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git
git fetch --no-tags powerpc next-test
git checkout 0cad223e88229b1f021f9e167bdb2ba7b0d0c369
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 
O=build_dir ARCH=powerpc SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

>> arch/powerpc/kernel/trace/ftrace.c:714:6: error: redefinition of 
>> 'ftrace_free_init_tramp'
 714 | void ftrace_free_init_tramp(void)
 |  ^~
   In file included from include/linux/ftrace.h:23,
from arch/powerpc/kernel/trace/ftrace.c:20:
   arch/powerpc/include/asm/ftrace.h:119:20: note: previous definition of 
'ftrace_free_init_tramp' with type 'void(void)'
 119 | static inline void ftrace_free_init_tramp(void) { }
 |^~


vim +/ftrace_free_init_tramp +714 arch/powerpc/kernel/trace/ftrace.c

   713  
 > 714  void ftrace_free_init_tramp(void)
   715  {
   716  int i;
   717  
   718  for (i = 0; i < NUM_FTRACE_TRAMPS && ftrace_tramps[i]; i++)
   719  if (ftrace_tramps[i] == (unsigned 
long)ftrace_tramp_init) {
   720  ftrace_tramps[i] = 0;
   721  return;
   722  }
   723  }
   724  

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp


Re: [PATCH v2 4/4] powerpc/52xx: Convert to use fwnode API

2022-05-17 Thread Andy Shevchenko
On Tue, May 17, 2022 at 09:38:56AM +1000, Michael Ellerman wrote:
> Andy Shevchenko  writes:
> > On Mon, May 16, 2022 at 05:05:12PM +0300, Andy Shevchenko wrote:
> >> On Mon, May 16, 2022 at 11:48:05PM +1000, Michael Ellerman wrote:
> >> > Andy Shevchenko  writes:
> >> > > We may convert the GPT driver to use fwnode API for the sake
> >> > > of consistency of the used APIs inside the driver.
> >> > 
> >> > I'm not sure about this one.
> >> > 
> >> > It's more consistent to use fwnode in this driver, but it's very
> >> > inconsistent with the rest of the powerpc code. We have basically no
> >> > uses of the fwnode APIs at the moment.
> >> 
> >> Fair point!
> >> 
> >> > It seems like a pretty straight-forward conversion, but there could
> >> > easily be a bug in there, I don't have any way to test it. Do you?
> >> 
> >> Nope, only compile testing. The important part of this series is to
> >> clean up of_node from GPIO library, so since here it's a user of
> >> it I want to do that. This patch is just ad-hoc conversion that I
> >> noticed is possible. But there is no any requirement to do so.
> >> 
> >> Lemme drop this from v3.
> >
> > I just realize that there is no point to send a v3. You can just apply
> > first 3 patches. Or is your comment against entire series?
> 
> No, my comment is just about this patch.
> 
> I don't mind converting to new APIs when it's blocking some other
> cleanup. But given the age of this code I think it's probably better to
> just leave the rest of it as-is, unless someone volunteers to test it.
> 
> So yeah I'll just take patches 1-3 of this v2 series, no need to resend.

Thanks!

One note though, the fwnode_for_each_parent_node() is not yet available in
upstream, but will be after v5.19-rc1. It means the patch 3 can't be applied
without that. That's why LKP complained on patch 4 in this series.

That said, the easiest way is to postpone it till v5.19-rc1 is out.

-- 
With Best Regards,
Andy Shevchenko




Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]

2022-05-17 Thread Eric W. Biederman



Looking at this the pr_err is absolutely needed.  If an unsupported case
winds up in the purgatory blob and the code can't handle it things
will fail silently much worse later.  So the proposed patch is
unfortunately the wrong direction.

"Naveen N. Rao"  writes:

> Baoquan He wrote:
>> On 04/25/22 at 11:11pm, Naveen N. Rao wrote:
>>> kexec_load_purgatory() can fail for many reasons - there is no need to
>>> print an error when encountering unsupported relocations.
>>> This solves a build issue on powerpc with binutils v2.36 and newer [1].
>>> Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
>>> symbols") [2], binutils started dropping section symbols that it thought
>> I am not familiar with binutils, while wondering if this exists in other
>> ARCHes except of ppc. Arm64 doesn't have the ARCH override either, do we
>> have problem with it?
>
> I'm not aware of this specific file causing a problem on other architectures -
> perhaps the config options differ enough. There are however more reports of
> similar issues affecting other architectures with the llvm integrated 
> assembler:
> https://github.com/ClangBuiltLinux/linux/issues/981
>
>> 
>>> were unused.  This isn't an issue in general, but with kexec_file.c, gcc
>>> is placing kexec_arch_apply_relocations[_add] into a separate
>>> .text.unlikely section and the section symbol ".text.unlikely" is being
>>> dropped. Due to this, recordmcount is unable to find a non-weak symbol
>> But arch_kexec_apply_relocations_add is weak symbol on ppc.
>
> Yes. Note that it is just the section symbol that gets dropped. The section is
> still present and will continue to hold the symbols for the functions
> themselves.

So we have a case where binutils thinks it is doing something useful
and our kernel specific tool gets tripped up by it.

Reading the recordmcount code it looks like it is finding any symbol
within a section but ignoring weak symbols.  So I suspect the only
remaining symbol in the section is __weak and that confuses
recordmcount.

Does removing the __weak annotation on those functions fix the build
error?  If so we can restructure the kexec code to simply not use __weak
symbols.

Otherwise the fix needs to be in recordmcount or binutils, and we should
loop whoever maintains recordmcount in to see what they can do.

Eric


Re: [PATCH 17/30] tracing: Improve panic/die notifiers

2022-05-17 Thread Guilherme G. Piccoli
On 11/05/2022 08:45, Petr Mladek wrote:
> [...]
> DIE_OOPS and PANIC_NOTIFIER are from different enum.
> It feels like comparing apples with oranges here.
> 
> IMHO, the proper way to unify the two notifiers is
> a check of the @self parameter. Something like:
> 
> static int trace_die_panic_handler(struct notifier_block *self,
>   unsigned long ev, void *unused)
> {
>   if (self == trace_die_notifier && val != DIE_OOPS)
>   goto out;
> 
>   ftrace_dump(ftrace_dump_on_oops);
> out:
>   return NOTIFY_DONE;
> }
> 
> Best Regards,
> Petr

OK Petr, thanks - will implement your suggestion in V2 (CC Steven)

Cheers!


Re: [PATCH 15/30] bus: brcmstb_gisb: Clean-up panic/die notifiers

2022-05-17 Thread Guilherme G. Piccoli
On 10/05/2022 12:28, Petr Mladek wrote:
> [...]
> IMHO, the check of the @self parameter was the proper solution.
> 
> "gisb_die_notifier" list uses @val from enum die_val.
> "gisb_panic_notifier" list uses @val from enum panic_notifier_val.
> 
> These are unrelated types. It might easily break when
> someone defines the same constant also in enum die_val.
> 
> Best Regards,
> Petr

OK Petr, I'll drop this idea for V2 - will just remove the useless
header / prototype then. (CC Florian)


Cheers,


Guilherme


Re: [PATCH 14/30] panic: Properly identify the panic event to the notifiers' callbacks

2022-05-17 Thread Guilherme G. Piccoli
On 17/05/2022 10:11, Petr Mladek wrote:
> [...]
>> You mentioned 2 cases:
>>
>> (a) Same notifier_list used in different situations;
>>
>> (b) Same *notifier callback* used in different lists;
>>
>> Mine is case (b), right? Can you show me an example of case (a)?
> 
> There are many examples of case (a):
> 
> [... snip ...] 
> These all call the same list/chain in different situations.
> The situation is distinguished by @val.
> 
> 
>> You can see in the following patches (or grep the kernel) that people are 
>> using
>> this identification parameter to determine which kind of OOPS trigger
>> the callback to condition the execution of the function to specific
>> cases.
> 
> Could you please show me some existing code for case (b)?
> I am not able to find any except in your patches.
> 

Hi Petr, thanks for the examples - I agree with you. In the end, seems
I'm kind of abusing the API. This id is used to distinguish different
situations in which the callback is called, but in the same
"realm"/notifier list.

In my case I have different list calling the same callback and
(ab-)using the id to make distinction. I can rework the patches using
pointer comparison, it's fine =)

So, I'll drop this patch in V2.

> Anyway, the solution in 16th patch is bad, definitely.
> hv_die_panic_notify_crash() uses "val" to disinguish
> both:
> 
>  + "panic_notifier_list" vs "die_chain"
>  + die_val when callen via "die_chain"
> 
> The API around "die_chain" API is not aware of enum panic_notifier_val
> and the API using "panic_notifier_list" is not aware of enum die_val.
> As I said, it is mixing apples and oranges and it is error prone.
> 

OK, I'll re-work that patch - there's more there to be changed, that one
is complex heheh

Cheers!


Re: [PATCH 2/2] powerpc/perf: Fix the threshold compare group constraint for power9

2022-05-17 Thread Athira Rajeev



> On 06-May-2022, at 11:40 AM, Kajol Jain  wrote:
> 
> Thresh compare bits for a event is used to program thresh compare
> field in Monitor Mode Control Register A (MMCRA: 9-18 bits for power9).
> When scheduling events as a group, all events in that group should
> match value in threshold bits (like thresh compare, thresh control,
> thresh select). Otherwise event open for the sibling events should fail.
> But in the current code, incase thresh compare bits are not valid,
> we are not failing in group_constraint function which can result
> in invalid group schduling.
> 
> Fix the issue by returning -1 incase event is threshold and threshold
> compare value is not valid.
> 
> Thresh control bits in the event code is used to program thresh_ctl
> field in Monitor Mode Control Register A (MMCRA: 48-55). In below example,
> the scheduling of group events PM_MRK_INST_CMPL (873534401e0) and
> PM_THRESH_MET (8734340101ec) is expected to fail as both event
> request different thresh control bits and invalid thresh compare value.
> 
> Result before the patch changes:
> 
> [command]# perf stat -e "{r8735340401e0,r8734340101ec}" sleep 1
> 
> Performance counter stats for 'sleep 1':
> 
>11,048  r8735340401e0
> 1,967  r8734340101ec
> 
>   1.001354036 seconds time elapsed
> 
>   0.001421000 seconds user
>   0.0 seconds sys
> 
> Result after the patch changes:
> 
> [command]# perf stat -e "{r8735340401e0,r8734340101ec}" sleep 1
> Error:
> The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> for event (r8735340401e0).
> /bin/dmesg | grep -i perf may provide additional information.
> 
> Fixes: 78a16d9fc1206 ("powerpc/perf: Avoid FAB_*_MATCH checks for power9")
> Signed-off-by: Kajol Jain 

Reviewed-by: Athira Rajeev 

Thanks
Athira
> ---
> arch/powerpc/perf/isa207-common.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/perf/isa207-common.c 
> b/arch/powerpc/perf/isa207-common.c
> index 013b06af6fe6..bb5d64862bc9 100644
> --- a/arch/powerpc/perf/isa207-common.c
> +++ b/arch/powerpc/perf/isa207-common.c
> @@ -508,7 +508,8 @@ int isa207_get_constraint(u64 event, unsigned long 
> *maskp, unsigned long *valp,
>   if (event_is_threshold(event) && is_thresh_cmp_valid(event)) {
>   mask  |= CNST_THRESH_MASK;
>   value |= CNST_THRESH_VAL(event >> EVENT_THRESH_SHIFT);
> - }
> + } else if (event_is_threshold(event))
> + return -1;
>   } else {
>   /*
>* Special case for PM_MRK_FAB_RSP_MATCH and 
> PM_MRK_FAB_RSP_MATCH_CYC,
> -- 
> 2.31.1
> 



Re: [PATCH] powerpc/irq: Remove arch_local_irq_restore() for !CONFIG_CC_HAS_ASM_GOTO

2022-05-17 Thread Michael Ellerman
Christophe Leroy  writes:
> All supported versions of GCC support asm goto.

I thought clang was the one that only recently added support for asm
goto.



Apparently clang added support in 2019, in clang 9. The earliest clang
we claim to support is 11.

So this patch is good, I'll just adjust the change log to say GCC/clang.

cheers

> diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
> index c768cde03e36..dd09919c3c66 100644
> --- a/arch/powerpc/kernel/irq.c
> +++ b/arch/powerpc/kernel/irq.c
> @@ -216,7 +216,6 @@ static inline void replay_soft_interrupts_irqrestore(void)
>  #define replay_soft_interrupts_irqrestore() replay_soft_interrupts()
>  #endif
>  
> -#ifdef CONFIG_CC_HAS_ASM_GOTO
>  notrace void arch_local_irq_restore(unsigned long mask)
>  {
>   unsigned char irq_happened;
> @@ -312,82 +311,6 @@ notrace void arch_local_irq_restore(unsigned long mask)
>   __hard_irq_enable();
>   preempt_enable();
>  }
> -#else
> -notrace void arch_local_irq_restore(unsigned long mask)
> -{
> - unsigned char irq_happened;
> -
> - /* Write the new soft-enabled value */
> - irq_soft_mask_set(mask);
> - if (mask)
> - return;
> -
> - if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
> - WARN_ON_ONCE(in_nmi() || in_hardirq());
> -
> - /*
> -  * From this point onward, we can take interrupts, preempt,
> -  * etc... unless we got hard-disabled. We check if an event
> -  * happened. If none happened, we know we can just return.
> -  *
> -  * We may have preempted before the check below, in which case
> -  * we are checking the "new" CPU instead of the old one. This
> -  * is only a problem if an event happened on the "old" CPU.
> -  *
> -  * External interrupt events will have caused interrupts to
> -  * be hard-disabled, so there is no problem, we
> -  * cannot have preempted.
> -  */
> - irq_happened = get_irq_happened();
> - if (!irq_happened) {
> - if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
> - WARN_ON_ONCE(!(mfmsr() & MSR_EE));
> - return;
> - }
> -
> - /* We need to hard disable to replay. */
> - if (!(irq_happened & PACA_IRQ_HARD_DIS)) {
> - if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
> - WARN_ON_ONCE(!(mfmsr() & MSR_EE));
> - __hard_irq_disable();
> - local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
> - } else {
> - /*
> -  * We should already be hard disabled here. We had bugs
> -  * where that wasn't the case so let's dbl check it and
> -  * warn if we are wrong. Only do that when IRQ tracing
> -  * is enabled as mfmsr() can be costly.
> -  */
> - if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG)) {
> - if (WARN_ON_ONCE(mfmsr() & MSR_EE))
> - __hard_irq_disable();
> - }
> -
> - if (irq_happened == PACA_IRQ_HARD_DIS) {
> - local_paca->irq_happened = 0;
> - __hard_irq_enable();
> - return;
> - }
> - }
> -
> - /*
> -  * Disable preempt here, so that the below preempt_enable will
> -  * perform resched if required (a replayed interrupt may set
> -  * need_resched).
> -  */
> - preempt_disable();
> - irq_soft_mask_set(IRQS_ALL_DISABLED);
> - trace_hardirqs_off();
> -
> - replay_soft_interrupts_irqrestore();
> - local_paca->irq_happened = 0;
> -
> - trace_hardirqs_on();
> - irq_soft_mask_set(IRQS_ENABLED);
> - __hard_irq_enable();
> - preempt_enable();
> -}
> -#endif
>  EXPORT_SYMBOL(arch_local_irq_restore);
>  
>  /*
> -- 
> 2.35.1


Re: [PATCH v1 0/4] Kill the time spent in patch_instruction()

2022-05-17 Thread Michael Ellerman
Christophe Leroy  writes:
> Le 15/05/2022 à 12:28, Michael Ellerman a écrit :
>> On Tue, 22 Mar 2022 16:40:17 +0100, Christophe Leroy wrote:
>>> This series reduces by 70% the time required to activate
>>> ftrace on an 8xx with CONFIG_STRICT_KERNEL_RWX.
>>>
>>> Measure is performed in function ftrace_replace_code() using mftb()
>>> around the loop.
>>>
>>> With the series,
>>> - Without CONFIG_STRICT_KERNEL_RWX, 416000 TB ticks are measured.
>>> - With CONFIG_STRICT_KERNEL_RWX, 546000 TB ticks are measured.
>>>
>>> [...]
>> 
>> Patches 1, 3 and 4 applied to powerpc/next.
>> 
>> [1/4] powerpc/code-patching: Don't call is_vmalloc_or_module_addr() without 
>> CONFIG_MODULES
>>
>> https://git.kernel.org/powerpc/c/cb3ac45214c03852430979a43180371a44b74596
>> [3/4] powerpc/code-patching: Use jump_label for testing freed initmem
>>
>> https://git.kernel.org/powerpc/c/b033767848c4115e486b1a51946de3bee2ac0fa6
>> [4/4] powerpc/code-patching: Use jump_label to check if poking_init() is done
>>
>> https://git.kernel.org/powerpc/c/1751289268ef959db68b0b6f798d904d6403309a
>> 
>
> Patch 2 was the keystone of this series. What happened to it ?

It broke on 64-bit. I think I know why but I haven't had time to test
it. Will try and get it fixed in the next day or two.

cheers


Re: [PATCH] powerpc/vdso: Fix incorrect CFI in gettimeofday.S

2022-05-17 Thread Michael Ellerman
"Naveen N. Rao"  writes:
> Michael Ellerman wrote:
>>
>> diff --git a/arch/powerpc/kernel/vdso/gettimeofday.S 
>> b/arch/powerpc/kernel/vdso/gettimeofday.S
>> index eb9c81e1c218..0aee255e9cbb 100644
>> --- a/arch/powerpc/kernel/vdso/gettimeofday.S
>> +++ b/arch/powerpc/kernel/vdso/gettimeofday.S
>> @@ -22,12 +22,15 @@
>>  .macro cvdso_call funct call_time=0
>>.cfi_startproc
>>  PPC_STLUr1, -PPC_MIN_STKFRM(r1)
>> +  .cfi_adjust_cfa_offset PPC_MIN_STKFRM
>>  mflrr0
>> -  .cfi_register lr, r0
>>  PPC_STLUr1, -PPC_MIN_STKFRM(r1)
>> +  .cfi_adjust_cfa_offset PPC_MIN_STKFRM
>>  PPC_STL r0, PPC_MIN_STKFRM + PPC_LR_STKOFF(r1)
>
> 
>
>> @@ -46,6 +50,7 @@
>>  mtlrr0
>>.cfi_restore lr
>>  addir1, r1, 2 * PPC_MIN_STKFRM
>> +  .cfi_def_cfa_offset 0
>
> Should this be .cfi_adjust_cfa_offset, given that we used that at the
> start of the function?
 
AIUI "adjust x" is offset += x, whereas "def x" is offset = x.

So we could use adjust here, but we'd need to adjust by -(2 * PPC_MIN_STKFRM).

It seemed clearer to just set the offset back to 0, which is what it is
at the start of the function.

But I'm not a CFI expert at all, so I'll defer to anyone else who has an
opinion :)

cheers


Re: [PATCH -next] powerpc/book3e: Fix build error

2022-05-17 Thread Christophe Leroy


Le 17/05/2022 à 11:48, YueHaibing a écrit :
> arch/powerpc/mm/nohash/fsl_book3e.c: In function ‘relocate_init’:
> arch/powerpc/mm/nohash/fsl_book3e.c:348:2: error: implicit declaration of 
> function ‘early_get_first_memblock_info’ 
> [-Werror=implicit-function-declaration]
>early_get_first_memblock_info(__va(dt_ptr), );
>^
> 
> Add missing include file linux/of_fdt.h to fix this.
> 
> Signed-off-by: YueHaibing 

Thats for fixing that.

Reviewed-by: Christophe Leroy 

It means we don't have any defconfig for 32 bits booke with 
CONFIG_RELOCATABLE ?

> ---
>   arch/powerpc/mm/nohash/fsl_book3e.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/arch/powerpc/mm/nohash/fsl_book3e.c 
> b/arch/powerpc/mm/nohash/fsl_book3e.c
> index 08a984e29433..093da4dbdee1 100644
> --- a/arch/powerpc/mm/nohash/fsl_book3e.c
> +++ b/arch/powerpc/mm/nohash/fsl_book3e.c
> @@ -36,6 +36,7 @@
>   #include 
>   #include 
>   #include 
> +#include 
>   
>   #include 
>   #include 

Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]

2022-05-17 Thread Naveen N. Rao

Baoquan He wrote:

On 04/25/22 at 11:11pm, Naveen N. Rao wrote:

kexec_load_purgatory() can fail for many reasons - there is no need to
print an error when encountering unsupported relocations.

This solves a build issue on powerpc with binutils v2.36 and newer [1].
Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
symbols") [2], binutils started dropping section symbols that it thought


I am not familiar with binutils, while wondering if this exists in other
ARCHes except of ppc. Arm64 doesn't have the ARCH override either, do we
have problem with it?


I'm not aware of this specific file causing a problem on other 
architectures - perhaps the config options differ enough. There are 
however more reports of similar issues affecting other architectures 
with the llvm integrated assembler:

https://github.com/ClangBuiltLinux/linux/issues/981




were unused.  This isn't an issue in general, but with kexec_file.c, gcc
is placing kexec_arch_apply_relocations[_add] into a separate
.text.unlikely section and the section symbol ".text.unlikely" is being
dropped. Due to this, recordmcount is unable to find a non-weak symbol


But arch_kexec_apply_relocations_add is weak symbol on ppc.


Yes. Note that it is just the section symbol that gets dropped. The 
section is still present and will continue to hold the symbols for the 
functions themselves.





in .text.unlikely to generate a relocation record against. Dropping
pr_err() calls results in these functions being left in .text section,


Why dropping pr_err() can make arch_kexec_apply_relocations_add put in
.text?


I'm not actually sure, though Josh suspected that printk() might be 
cold:

http://lkml.kernel.org/r/20210214155147.3owdimqv2lyhu6by@treble


- Naveen



Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]

2022-05-17 Thread Baoquan He
On 04/25/22 at 11:11pm, Naveen N. Rao wrote:
> kexec_load_purgatory() can fail for many reasons - there is no need to
> print an error when encountering unsupported relocations.
> 
> This solves a build issue on powerpc with binutils v2.36 and newer [1].
> Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
> symbols") [2], binutils started dropping section symbols that it thought

I am not familiar with binutils, while wondering if this exists in other
ARCHes except of ppc. Arm64 doesn't have the ARCH override either, do we
have problem with it?

> were unused.  This isn't an issue in general, but with kexec_file.c, gcc
> is placing kexec_arch_apply_relocations[_add] into a separate
> .text.unlikely section and the section symbol ".text.unlikely" is being
> dropped. Due to this, recordmcount is unable to find a non-weak symbol

But arch_kexec_apply_relocations_add is weak symbol on ppc.

> in .text.unlikely to generate a relocation record against. Dropping
> pr_err() calls results in these functions being left in .text section,

Why dropping pr_err() can make arch_kexec_apply_relocations_add put in
.text?

> enabling recordmcount to emit a proper relocation record.
> 
> [1] https://github.com/linuxppc/issues/issues/388
> [2] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1
> 
> Signed-off-by: Naveen N. Rao 
> ---
>  kernel/kexec_file.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index 8347fc158d2b96..55d144c58b5278 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -121,7 +121,6 @@ int __weak
>  arch_kexec_apply_relocations_add(struct purgatory_info *pi, Elf_Shdr 
> *section,
>const Elf_Shdr *relsec, const Elf_Shdr *symtab)
>  {
> - pr_err("RELA relocation unsupported.\n");
>   return -ENOEXEC;
>  }
>  
> @@ -138,7 +137,6 @@ int __weak
>  arch_kexec_apply_relocations(struct purgatory_info *pi, Elf_Shdr *section,
>const Elf_Shdr *relsec, const Elf_Shdr *symtab)
>  {
> - pr_err("REL relocation unsupported.\n");
>   return -ENOEXEC;
>  }
>  
> 
> base-commit: 83d8a0d166119de813cad27ae7d61f54f9aea707
> -- 
> 2.35.1
> 



[Bug 215389] pagealloc: memory corruption at building glibc-2.33 and running its' testsuite

2022-05-17 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=215389

--- Comment #26 from Christophe Leroy (christophe.le...@csgroup.eu) ---
Note that THREAD_SHIFT is set to 14 when using KASAN:

config THREAD_SHIFT
int "Thread shift" if EXPERT
range 13 15
default "15" if PPC_256K_PAGES
default "14" if PPC64
default "14" if KASAN
default "13"
help
  Used to define the stack size. The default is almost always what you
  want. Only change this if you know what you are doing.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 215389] pagealloc: memory corruption at building glibc-2.33 and running its' testsuite

2022-05-17 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=215389

--- Comment #25 from Christophe Leroy (christophe.le...@csgroup.eu) ---
The Kernel stack overflow looks odd.

Value of R1 is wrong and LR is NULL. Don't know how we ended up here, but
probably not by a real stack overflow.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 215389] pagealloc: memory corruption at building glibc-2.33 and running its' testsuite

2022-05-17 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=215389

--- Comment #24 from Christophe Leroy (christophe.le...@csgroup.eu) ---
Seems like with Inline KASAN your kernel is far too big compared to what we
support at the time being:

c2468000 T __end_rodata
c280 T __init_begin
c280 T _sinittext

c2801644 T prom_init

The init text section is behind the 32Mbytes boundary, it means that prom_init
and other functions are not called anymore directly but via a trampoline.

c00c <__start>:
c00c:   2c 05 00 00 cmpwi   r5,0
c010:   41 82 00 1c beq c02c <__start+0x20>
c014:   42 9f 00 05 bcl 20,4*cr7+so,c018 <__start+0xc>
c018:   7d 08 02 a6 mflrr8
c01c:   3d 08 00 00 addis   r8,r8,0
c020:   39 08 ff e8 addir8,r8,-24
c024:   48 00 38 e5 bl  c0003908 
...
c0003908:   3d 80 c2 80 lis r12,-15744
c000390c:   39 8c 16 44 addir12,r12,5700
c0003910:   7d 89 03 a6 mtctr   r12
c0003914:   4e 80 04 20 bctr


And it cannot work because at that time the kernel is not yet relocated to its
final location.

There was the same problem with PPC64 and it was fix by 24d33ac5b8ff
("powerpc/64s: Make prom_init require RELOCATABLE").

Don't know if a similar approach could work.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH 3/3] [RFC] powerpc: Book3S 64-bit outline-only KASAN support

2022-05-17 Thread Christophe Leroy


Le 17/05/2022 à 09:31, Paul Mackerras a écrit :
> On Sun, May 15, 2022 at 07:33:52AM +, Christophe Leroy wrote:
>>
>>
>> Le 11/05/2022 à 09:28, Paul Mackerras a écrit :
>>> From: Daniel Axtens 
>>>
>>> Implement a limited form of KASAN for Book3S 64-bit machines running under
>>> the Radix MMU, supporting only outline mode.
> 
> [snip]
> 
>>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>>> index b779603978e1..0bf76e40c9c2 100644
>>> --- a/arch/powerpc/Kconfig
>>> +++ b/arch/powerpc/Kconfig
>>> @@ -105,6 +105,7 @@ config PPC
>>> # Please keep this list sorted alphabetically.
>>> #
>>> select ARCH_32BIT_OFF_T if PPC32
>>> +   select ARCH_DISABLE_KASAN_INLINEif PPC_RADIX_MMU
>>> select ARCH_ENABLE_MEMORY_HOTPLUG
>>> select ARCH_ENABLE_MEMORY_HOTREMOVE
>>> select ARCH_HAS_COPY_MC if PPC64
>>> @@ -152,6 +153,7 @@ config PPC
>>> select ARCH_WANT_IPC_PARSE_VERSION
>>> select ARCH_WANT_IRQS_OFF_ACTIVATE_MM
>>> select ARCH_WANT_LD_ORPHAN_WARN
>>> +   select ARCH_WANTS_NO_INSTR
>>
>> Can you explain why we need that ?
> 
> The help text for the option says:
> 
>   An architecture should select this if the noinstr macro is being used on
>   functions to denote that the toolchain should avoid instrumenting such
>   functions and is required for correctness.
> 
> All it really seems to do is restrict the conditions under which the
> GCOV and KCOV options can be enabled.
> 
>> Is it tied to KASAN ? Is yes, why didn't we have it for PPC32 until now ?
> 
> Since we really do need to avoid instrumenting certain functions on
> ppc64 (as in things will break if we do instrument them), I think we
> need to select ARCH_WANTS_NO_INSTR.
> 
> For ppc32, as far as I recall there is much less code that runs in
> real mode and it is mostly assembler (except for some boot code),
> mostly because all addresses have to be explicitly translated to
> physical addresses for 32-bit real-mode code, unlike ppc64 where we
> can use access memory in the linear mapping using virtual addresses
> because of the fact that the CPU ignores the top 4 bits of the
> effective address in real mode.  That said, there is a lot less code
> that runs in real mode on ppc64 than there used to be.
> 
>> Maybe that's independant of KASAN and would be worth a separate patch ?
> 
> Yes, possibly, though KASAN does appear to be the only user of noinstr
> in arch/powerpc.
> 
> [snip]
> 
>>> diff --git a/arch/powerpc/include/asm/book3s/64/radix.h 
>>> b/arch/powerpc/include/asm/book3s/64/radix.h
>>> index d090d9612348..bafc9869afcd 100644
>>> --- a/arch/powerpc/include/asm/book3s/64/radix.h
>>> +++ b/arch/powerpc/include/asm/book3s/64/radix.h
>>> @@ -35,6 +35,11 @@
>>>#define RADIX_PMD_SHIFT  (PAGE_SHIFT + RADIX_PTE_INDEX_SIZE)
>>>#define RADIX_PUD_SHIFT  (RADIX_PMD_SHIFT + RADIX_PMD_INDEX_SIZE)
>>>#define RADIX_PGD_SHIFT  (RADIX_PUD_SHIFT + RADIX_PUD_INDEX_SIZE)
>>> +
>>> +#define R_PTRS_PER_PTE (1 << RADIX_PTE_INDEX_SIZE)
>>> +#define R_PTRS_PER_PMD (1 << RADIX_PMD_INDEX_SIZE)
>>> +#define R_PTRS_PER_PUD (1 << RADIX_PUD_INDEX_SIZE)
>>> +
>>>/*
>>> * Size of EA range mapped by our pagetables.
>>> */
>>> @@ -68,11 +73,11 @@
>>> *
>>> *
>>> * 3rd quadrant expanded:
>>> - * +--+
>>> + * +--+  Highest address (0xc010)
>>> + * +--+  KASAN shadow end (0xc00fc000)
>>> * |  |
>>> * |  |
>>> - * |  |
>>> - * +--+  Kernel vmemmap end 
>>> (0xc010)
>>> + * +--+  Kernel vmemmap end/shadow start 
>>> (0xc00e)
>>> * |  |
>>> * |   512TB   |
>>> * |  |
>>> @@ -126,6 +131,8 @@
>>>#define RADIX_VMEMMAP_SIZE   RADIX_KERN_MAP_SIZE
>>>#define RADIX_VMEMMAP_END(RADIX_VMEMMAP_START + 
>>> RADIX_VMEMMAP_SIZE)
>>>
>>> +/* For the sizes of the shadow area, see kasan.h */
>>> +
>>
>> Why does this comment pops up here ?
> 
> Do you mean, why is it there at all, or do you mean why is it in that
> particular place rather than closer to the "3rd quadrant expanded"
> diagram or something?

Why at this place mainly, I mean I can't see the relationship between 
the added comment and the following lines starting with a #ifndef ASSEMBLY.

If the comment is to be usefull, it should be added to a related place I 
guess.

> 
>>>#ifndef __ASSEMBLY__
>>>#define RADIX_PTE_TABLE_SIZE (sizeof(pte_t) << RADIX_PTE_INDEX_SIZE)
>>>#define RADIX_PMD_TABLE_SIZE (sizeof(pmd_t) << RADIX_PMD_INDEX_SIZE)
>>> diff --git a/arch/powerpc/include/asm/interrupt.h 
>>> b/arch/powerpc/include/asm/interrupt.h
>>> index 

Re: [PATCH] powerpc/vdso: Fix incorrect CFI in gettimeofday.S

2022-05-17 Thread Naveen N. Rao

Michael Ellerman wrote:


diff --git a/arch/powerpc/kernel/vdso/gettimeofday.S 
b/arch/powerpc/kernel/vdso/gettimeofday.S
index eb9c81e1c218..0aee255e9cbb 100644
--- a/arch/powerpc/kernel/vdso/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso/gettimeofday.S
@@ -22,12 +22,15 @@
 .macro cvdso_call funct call_time=0
   .cfi_startproc
PPC_STLUr1, -PPC_MIN_STKFRM(r1)
+  .cfi_adjust_cfa_offset PPC_MIN_STKFRM
mflrr0
-  .cfi_register lr, r0
PPC_STLUr1, -PPC_MIN_STKFRM(r1)
+  .cfi_adjust_cfa_offset PPC_MIN_STKFRM
PPC_STL r0, PPC_MIN_STKFRM + PPC_LR_STKOFF(r1)





@@ -46,6 +50,7 @@
mtlrr0
   .cfi_restore lr
addir1, r1, 2 * PPC_MIN_STKFRM
+  .cfi_def_cfa_offset 0


Should this be .cfi_adjust_cfa_offset, given that we used that at the 
start of the function?



- Naveen



Re: [PATCH] kexec_file: Drop pr_err in weak implementations of arch_kexec_apply_relocations[_add]

2022-05-17 Thread Naveen N. Rao

Naveen N. Rao wrote:

kexec_load_purgatory() can fail for many reasons - there is no need to
print an error when encountering unsupported relocations.

This solves a build issue on powerpc with binutils v2.36 and newer [1].
Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
symbols") [2], binutils started dropping section symbols that it thought
were unused.  This isn't an issue in general, but with kexec_file.c, gcc
is placing kexec_arch_apply_relocations[_add] into a separate
.text.unlikely section and the section symbol ".text.unlikely" is being
dropped. Due to this, recordmcount is unable to find a non-weak symbol
in .text.unlikely to generate a relocation record against. Dropping
pr_err() calls results in these functions being left in .text section,
enabling recordmcount to emit a proper relocation record.

[1] https://github.com/linuxppc/issues/issues/388
[2] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1

Signed-off-by: Naveen N. Rao 
---
 kernel/kexec_file.c | 2 --
 1 file changed, 2 deletions(-)


Eric,
Any comments on this? There have been many reports of build breakages 
due to this.


FWIW, there have been similar fixes in the past:
- 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/init/initramfs.c?id=55d5b7dd6451b58489ce384282ca5a4a289eb8d5
- 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/elfcore.h?id=6e7b64b9dd6d96537d816ea07ec26b7dedd397b9


- Naveen



Re: [PATCH 3/3] [RFC] powerpc: Book3S 64-bit outline-only KASAN support

2022-05-17 Thread Paul Mackerras
On Sun, May 15, 2022 at 07:33:52AM +, Christophe Leroy wrote:
> 
> 
> Le 11/05/2022 à 09:28, Paul Mackerras a écrit :
> > From: Daniel Axtens 
> > 
> > Implement a limited form of KASAN for Book3S 64-bit machines running under
> > the Radix MMU, supporting only outline mode.

[snip]

> > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > index b779603978e1..0bf76e40c9c2 100644
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -105,6 +105,7 @@ config PPC
> > # Please keep this list sorted alphabetically.
> > #
> > select ARCH_32BIT_OFF_T if PPC32
> > +   select ARCH_DISABLE_KASAN_INLINEif PPC_RADIX_MMU
> > select ARCH_ENABLE_MEMORY_HOTPLUG
> > select ARCH_ENABLE_MEMORY_HOTREMOVE
> > select ARCH_HAS_COPY_MC if PPC64
> > @@ -152,6 +153,7 @@ config PPC
> > select ARCH_WANT_IPC_PARSE_VERSION
> > select ARCH_WANT_IRQS_OFF_ACTIVATE_MM
> > select ARCH_WANT_LD_ORPHAN_WARN
> > +   select ARCH_WANTS_NO_INSTR
> 
> Can you explain why we need that ?

The help text for the option says:

 An architecture should select this if the noinstr macro is being used on
 functions to denote that the toolchain should avoid instrumenting such
 functions and is required for correctness.

All it really seems to do is restrict the conditions under which the
GCOV and KCOV options can be enabled.

> Is it tied to KASAN ? Is yes, why didn't we have it for PPC32 until now ?

Since we really do need to avoid instrumenting certain functions on
ppc64 (as in things will break if we do instrument them), I think we
need to select ARCH_WANTS_NO_INSTR.

For ppc32, as far as I recall there is much less code that runs in
real mode and it is mostly assembler (except for some boot code),
mostly because all addresses have to be explicitly translated to
physical addresses for 32-bit real-mode code, unlike ppc64 where we
can use access memory in the linear mapping using virtual addresses
because of the fact that the CPU ignores the top 4 bits of the
effective address in real mode.  That said, there is a lot less code
that runs in real mode on ppc64 than there used to be.

> Maybe that's independant of KASAN and would be worth a separate patch ?

Yes, possibly, though KASAN does appear to be the only user of noinstr
in arch/powerpc.

[snip]

> > diff --git a/arch/powerpc/include/asm/book3s/64/radix.h 
> > b/arch/powerpc/include/asm/book3s/64/radix.h
> > index d090d9612348..bafc9869afcd 100644
> > --- a/arch/powerpc/include/asm/book3s/64/radix.h
> > +++ b/arch/powerpc/include/asm/book3s/64/radix.h
> > @@ -35,6 +35,11 @@
> >   #define RADIX_PMD_SHIFT   (PAGE_SHIFT + RADIX_PTE_INDEX_SIZE)
> >   #define RADIX_PUD_SHIFT   (RADIX_PMD_SHIFT + RADIX_PMD_INDEX_SIZE)
> >   #define RADIX_PGD_SHIFT   (RADIX_PUD_SHIFT + RADIX_PUD_INDEX_SIZE)
> > +
> > +#define R_PTRS_PER_PTE (1 << RADIX_PTE_INDEX_SIZE)
> > +#define R_PTRS_PER_PMD (1 << RADIX_PMD_INDEX_SIZE)
> > +#define R_PTRS_PER_PUD (1 << RADIX_PUD_INDEX_SIZE)
> > +
> >   /*
> >* Size of EA range mapped by our pagetables.
> >*/
> > @@ -68,11 +73,11 @@
> >*
> >*
> >* 3rd quadrant expanded:
> > - * +--+
> > + * +--+  Highest address (0xc010)
> > + * +--+  KASAN shadow end (0xc00fc000)
> >* |  |
> >* |  |
> > - * |  |
> > - * +--+  Kernel vmemmap end 
> > (0xc010)
> > + * +--+  Kernel vmemmap end/shadow start 
> > (0xc00e)
> >* |  |
> >* |   512TB|
> >* |  |
> > @@ -126,6 +131,8 @@
> >   #define RADIX_VMEMMAP_SIZERADIX_KERN_MAP_SIZE
> >   #define RADIX_VMEMMAP_END (RADIX_VMEMMAP_START + RADIX_VMEMMAP_SIZE)
> >   
> > +/* For the sizes of the shadow area, see kasan.h */
> > +
> 
> Why does this comment pops up here ?

Do you mean, why is it there at all, or do you mean why is it in that
particular place rather than closer to the "3rd quadrant expanded"
diagram or something?

> >   #ifndef __ASSEMBLY__
> >   #define RADIX_PTE_TABLE_SIZE  (sizeof(pte_t) << RADIX_PTE_INDEX_SIZE)
> >   #define RADIX_PMD_TABLE_SIZE  (sizeof(pmd_t) << RADIX_PMD_INDEX_SIZE)
> > diff --git a/arch/powerpc/include/asm/interrupt.h 
> > b/arch/powerpc/include/asm/interrupt.h
> > index fc28f46d2f9d..fb244b6ca7f0 100644
> > --- a/arch/powerpc/include/asm/interrupt.h
> > +++ b/arch/powerpc/include/asm/interrupt.h
> > @@ -327,22 +327,46 @@ static inline void interrupt_nmi_enter_prepare(struct 
> > pt_regs *regs, struct inte
> > }
> >   #endif
> >   
> > +   /* If data relocations are enabled, it's safe to use nmi_enter() */
> > +   if (mfmsr() & MSR_DR) {
> > +   nmi_enter();
> > + 

Re: [PATCH v1 0/4] Kill the time spent in patch_instruction()

2022-05-17 Thread Christophe Leroy


Le 15/05/2022 à 12:28, Michael Ellerman a écrit :
> On Tue, 22 Mar 2022 16:40:17 +0100, Christophe Leroy wrote:
>> This series reduces by 70% the time required to activate
>> ftrace on an 8xx with CONFIG_STRICT_KERNEL_RWX.
>>
>> Measure is performed in function ftrace_replace_code() using mftb()
>> around the loop.
>>
>> With the series,
>> - Without CONFIG_STRICT_KERNEL_RWX, 416000 TB ticks are measured.
>> - With CONFIG_STRICT_KERNEL_RWX, 546000 TB ticks are measured.
>>
>> [...]
> 
> Patches 1, 3 and 4 applied to powerpc/next.
> 
> [1/4] powerpc/code-patching: Don't call is_vmalloc_or_module_addr() without 
> CONFIG_MODULES
>
> https://git.kernel.org/powerpc/c/cb3ac45214c03852430979a43180371a44b74596
> [3/4] powerpc/code-patching: Use jump_label for testing freed initmem
>
> https://git.kernel.org/powerpc/c/b033767848c4115e486b1a51946de3bee2ac0fa6
> [4/4] powerpc/code-patching: Use jump_label to check if poking_init() is done
>
> https://git.kernel.org/powerpc/c/1751289268ef959db68b0b6f798d904d6403309a
> 

Patch 2 was the keystone of this series. What happened to it ?

Christophe