Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag

2022-10-13 Thread Guenter Roeck
On Thu, Oct 13, 2022 at 03:14:08PM +1000, Nicholas Piggin wrote:
> > > 
> > > BUG: using smp_processor_id() in preemptible [] code: swapper/0/1
> > > caller is .__flush_tlb_pending+0x40/0xf0
> > > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.0.0-28380-gde492c83cae0-dirty 
> > > #4
> > > Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac
> > > Call Trace:
> > > [c44c3540] [c0f93ef0] .dump_stack_lvl+0x7c/0xc4 
> > > (unreliable)
> > > [c44c35d0] [c0fc9550] 
> > > .check_preemption_disabled+0x140/0x150
> > > [c44c3660] [c0073dd0] .__flush_tlb_pending+0x40/0xf0
> > > [c44c36f0] [c0334434] .__apply_to_page_range+0x764/0xa30
> > > [c44c3840] [c006cad0] .change_memory_attr+0xf0/0x160
> > > [c44c38d0] [c02a1d70] .bpf_prog_select_runtime+0x150/0x230
> > > [c44c3970] [c0d405d4] .bpf_prepare_filter+0x504/0x6f0
> > > [c44c3a30] [c0d4085c] .bpf_prog_create+0x9c/0x140
> > > [c44c3ac0] [c2051d9c] .ptp_classifier_init+0x44/0x78
> > > [c44c3b50] [c2050f3c] .sock_init+0xe0/0x100
> > > [c44c3bd0] [c0010bd4] .do_one_initcall+0xa4/0x438
> > > [c44c3cc0] [c2005008] .kernel_init_freeable+0x378/0x428
> > > [c44c3da0] [c00113d8] .kernel_init+0x28/0x1a0
> > > [c44c3e10] [c000ca3c] .ret_from_kernel_thread+0x58/0x60
> > > 
> > > This in turn is because __flush_tlb_pending() calls:
> > > 
> > > static inline int mm_is_thread_local(struct mm_struct *mm)
> > > {
> > >  return cpumask_equal(mm_cpumask(mm),
> > >cpumask_of(smp_processor_id()));
> > > }
> > > 
> > > __flush_tlb_pending() has a comment about this:
> > > 
> > >   * Must be called from within some kind of spinlock/non-preempt region...
> > >   */
> > > void __flush_tlb_pending(struct ppc64_tlb_batch *batch)
> > > 
> > > So I guess that didn't happen for some reason? Maybe this is indicative
> > > of some lock imbalance that then gets hit later?
> >
> > I managed to bisect that problem. Unfortunately it points to the
> > scheduler merge. No idea what to do about that. Any idea ?
> > I am copying Peter and Ingo for comments.
> >
> 
> > # first bad commit: [30c37f69abf935b0228b8411713737377d9e] Merge tag 
> > 'sched-core-2022-10-07' of 
> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> 
> This might be a red herring because I can reproduce without it.
> I think we can fix this with some preempt critical sections, they
> don't look too much of a problem.
> 

Do you refer to the bisect of the BUG: message above, or to the other
problem ? I can try to repeat the bisect with some retries if you
think that 30c37f69a isn't responsible for "BUG: using
smp_processor_id() in preemptible [] code".

Thanks,
Guenter


Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag

2022-10-12 Thread Nicholas Piggin
On Thu Oct 13, 2022 at 10:21 AM AEST, Guenter Roeck wrote:
> On Thu, Oct 13, 2022 at 11:03:34AM +1100, Michael Ellerman wrote:
> > Guenter Roeck  writes:
> > > On Wed, Oct 12, 2022 at 11:20:38AM -0600, Jason A. Donenfeld wrote:
> > >> 
> > >> I've also managed to not hit this bug a few times. When it triggers,
> > >> after "kprobes: kprobe jump-optimization is enabled. All kprobes are
> > >> optimized if possible.", there's a long hang - tens seconds before it
> > >> continues. When it doesn't trigger, there's no hang at that point in the
> > >> boot process.
> > >> 
> > >
> > > I managed to bisect the problem. See below for results. Reverting the
> > > offending patch fixes the problem for me.
> > 
> > Thanks.
> > 
> > This is probably down to me/us not testing with PREEMPT enabled enough.
> > 
> Not sure. My configuration has
>
> CONFIG_PREEMPT_NONE=y
> # CONFIG_PREEMPT_VOLUNTARY is not set
> # CONFIG_PREEMPT is not set

Okay I reproduced it, just takes a while to hit.

Thanks,
Nick


Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag

2022-10-12 Thread Guenter Roeck

On 10/12/22 22:03, Nicholas Piggin wrote:

On Thu Oct 13, 2022 at 10:21 AM AEST, Guenter Roeck wrote:

On Thu, Oct 13, 2022 at 11:03:34AM +1100, Michael Ellerman wrote:

Guenter Roeck  writes:

On Wed, Oct 12, 2022 at 11:20:38AM -0600, Jason A. Donenfeld wrote:


I've also managed to not hit this bug a few times. When it triggers,
after "kprobes: kprobe jump-optimization is enabled. All kprobes are
optimized if possible.", there's a long hang - tens seconds before it
continues. When it doesn't trigger, there's no hang at that point in the
boot process.



I managed to bisect the problem. See below for results. Reverting the
offending patch fixes the problem for me.


Thanks.

This is probably down to me/us not testing with PREEMPT enabled enough.


Not sure. My configuration has

CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set


Thanks very much for helping with this. The config snippet you posted here
https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-October/249758.html
has CONFIG_PREEMPT=y. How do you turn that into a .config, olddefconfig?

I can't reproduce this so far using your config and qemu command line,
but the patch you've bisected it to definitely could cause this. I'll
keep trying...



Uuh, sorry, I think I got confused with running multiple bisects on the
same branch, and took the above from a different bisect run. You are
correct, PREEMPT is enabled in the configuration.

Timing is definitely involved; I see the problem more often on a loaded
system. To bisect it, I had to repeat the test for each bisect step
several times (I set the limit to 20 retries; that gave me reliable
results).

Guenter


Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag

2022-10-12 Thread Jason A. Donenfeld
On Thu, Oct 13, 2022 at 03:03:14PM +1000, Nicholas Piggin wrote:
> On Thu Oct 13, 2022 at 10:21 AM AEST, Guenter Roeck wrote:
> > On Thu, Oct 13, 2022 at 11:03:34AM +1100, Michael Ellerman wrote:
> > > Guenter Roeck  writes:
> > > > On Wed, Oct 12, 2022 at 11:20:38AM -0600, Jason A. Donenfeld wrote:
> > > >> 
> > > >> I've also managed to not hit this bug a few times. When it triggers,
> > > >> after "kprobes: kprobe jump-optimization is enabled. All kprobes are
> > > >> optimized if possible.", there's a long hang - tens seconds before it
> > > >> continues. When it doesn't trigger, there's no hang at that point in 
> > > >> the
> > > >> boot process.
> > > >> 
> > > >
> > > > I managed to bisect the problem. See below for results. Reverting the
> > > > offending patch fixes the problem for me.
> > > 
> > > Thanks.
> > > 
> > > This is probably down to me/us not testing with PREEMPT enabled enough.
> > > 
> > Not sure. My configuration has
> >
> > CONFIG_PREEMPT_NONE=y
> > # CONFIG_PREEMPT_VOLUNTARY is not set
> > # CONFIG_PREEMPT is not set
> 
> Thanks very much for helping with this. The config snippet you posted here
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-October/249758.html
> has CONFIG_PREEMPT=y. How do you turn that into a .config, olddefconfig?
> 
> I can't reproduce this so far using your config and qemu command line,
> but the patch you've bisected it to definitely could cause this. I'll
> keep trying...

Voila https://xn--4db.cc/dt00j0mt this repros it for me.

> 
> Thanks,
> Nick
> 
> [...]
> > > > # first bad commit: [e485f6c751e0a969327336c635ca602feea117f0] 
> > > > powerpc/64/interrupt: Fix return to masked context after hard-mask irq 
> > > > becomes pending
> 


Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag

2022-10-12 Thread Nicholas Piggin
On Thu Oct 13, 2022 at 4:37 AM AEST, Jason A. Donenfeld wrote:
> On Wed, Oct 12, 2022 at 10:48:26AM -0700, Guenter Roeck wrote:
> > > I've also managed to not hit this bug a few times. When it triggers,
> > > after "kprobes: kprobe jump-optimization is enabled. All kprobes are
> > > optimized if possible.", there's a long hang - tens seconds before it
> > > continues. When it doesn't trigger, there's no hang at that point in the
> > > boot process.
> > > 
> > 
> > That probably explains why my attempts to bisect the problem were
> > unsuccessful.
>
> So I just did this:
>
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 2fe28eeb2f38..2d70bc09db7e 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -1212,6 +1212,7 @@ static void __cold try_to_generate_entropy(void)
> struct entropy_timer_state stack;
> unsigned int i, num_different = 0;
> unsigned long last = random_get_entropy();
> +   return;
>
> for (i = 0; i < NUM_TRIAL_SAMPLES - 1; ++i) {
> stack.entropy = random_get_entropy();
>
> And then ran it, and now we get the lockup from the idle process:

Yep that rules out the random code. And really if it was calling
schedule() it shouldn't be getting a softlockup anyway.

Thanks,
Nick


Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag

2022-10-12 Thread Nicholas Piggin
On Thu Oct 13, 2022 at 2:43 PM AEST, Guenter Roeck wrote:
> On 10/12/22 10:20, Jason A. Donenfeld wrote:
> > On Wed, Oct 12, 2022 at 09:44:52AM -0700, Guenter Roeck wrote:
> >> On Wed, Oct 12, 2022 at 09:49:26AM -0600, Jason A. Donenfeld wrote:
> >>> On Wed, Oct 12, 2022 at 07:18:27AM -0700, Guenter Roeck wrote:
>  NIP [c0031630] .replay_soft_interrupts+0x60/0x300
>  LR [c0031964] .arch_local_irq_restore+0x94/0x1c0
>  Call Trace:
>  [c7df3870] [c0031964] .arch_local_irq_restore+0x94/0x1c0 
>  (unreliable)
>  [c7df38f0] [c0f8a444] .__schedule+0x664/0xa50
>  [c7df39d0] [c0f8a8b0] .schedule+0x80/0x140
>  [c7df3a50] [c092f0dc] 
>  .try_to_generate_entropy+0x118/0x174
>  [c7df3b40] [c092e2e4] .urandom_read_iter+0x74/0x140
>  [c7df3bc0] [c03b0044] .vfs_read+0x284/0x2d0
>  [c7df3cd0] [c03b0d2c] .ksys_read+0xdc/0x130
>  [c7df3d80] [c002a88c] .system_call_exception+0x19c/0x330
>  [c7df3e10] [c000c1d4] system_call_common+0xf4/0x258
> >>>
> >>> Obviously the first couple lines of this concern me a bit. But I think
> >>> actually this might just be a catalyst for another bug. You could view
> >>> that function as basically just:
> >>>
> >>>  while (something)
> >>>   schedule();
> >>>
> >>> And I guess in the process of calling the scheduler a lot, which toggles
> >>> interrupts a lot, something got wedged.
> >>>
> >>> Curious, though, I did try to reproduce this, to no avail. My .config is
> >>> https://xn--4db.cc/rBvHWfDZ . What's yours?
> >>>
> >>
> >> Attached. My qemu command line is
> > 
> > Okay, thanks, I reproduced it. In this case, I suspect
> > try_to_generate_entropy() is just the messenger. There's an earlier
> > problem:
> > 
> > BUG: using smp_processor_id() in preemptible [] code: swapper/0/1
> > caller is .__flush_tlb_pending+0x40/0xf0
> > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.0.0-28380-gde492c83cae0-dirty #4
> > Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac
> > Call Trace:
> > [c44c3540] [c0f93ef0] .dump_stack_lvl+0x7c/0xc4 (unreliable)
> > [c44c35d0] [c0fc9550] .check_preemption_disabled+0x140/0x150
> > [c44c3660] [c0073dd0] .__flush_tlb_pending+0x40/0xf0
> > [c44c36f0] [c0334434] .__apply_to_page_range+0x764/0xa30
> > [c44c3840] [c006cad0] .change_memory_attr+0xf0/0x160
> > [c44c38d0] [c02a1d70] .bpf_prog_select_runtime+0x150/0x230
> > [c44c3970] [c0d405d4] .bpf_prepare_filter+0x504/0x6f0
> > [c44c3a30] [c0d4085c] .bpf_prog_create+0x9c/0x140
> > [c44c3ac0] [c2051d9c] .ptp_classifier_init+0x44/0x78
> > [c44c3b50] [c2050f3c] .sock_init+0xe0/0x100
> > [c44c3bd0] [c0010bd4] .do_one_initcall+0xa4/0x438
> > [c44c3cc0] [c2005008] .kernel_init_freeable+0x378/0x428
> > [c44c3da0] [c00113d8] .kernel_init+0x28/0x1a0
> > [c44c3e10] [c000ca3c] .ret_from_kernel_thread+0x58/0x60
> > 
> > This in turn is because __flush_tlb_pending() calls:
> > 
> > static inline int mm_is_thread_local(struct mm_struct *mm)
> > {
> >  return cpumask_equal(mm_cpumask(mm),
> >cpumask_of(smp_processor_id()));
> > }
> > 
> > __flush_tlb_pending() has a comment about this:
> > 
> >   * Must be called from within some kind of spinlock/non-preempt region...
> >   */
> > void __flush_tlb_pending(struct ppc64_tlb_batch *batch)
> > 
> > So I guess that didn't happen for some reason? Maybe this is indicative
> > of some lock imbalance that then gets hit later?
>
> I managed to bisect that problem. Unfortunately it points to the
> scheduler merge. No idea what to do about that. Any idea ?
> I am copying Peter and Ingo for comments.
>

> # first bad commit: [30c37f69abf935b0228b8411713737377d9e] Merge tag 
> 'sched-core-2022-10-07' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

This might be a red herring because I can reproduce without it.
I think we can fix this with some preempt critical sections, they
don't look too much of a problem.

I don't know why it's not showing up earlier than this release,
I'll look into it a bit more.

Thanks,
Nick


Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag

2022-10-12 Thread Nicholas Piggin
On Thu Oct 13, 2022 at 10:21 AM AEST, Guenter Roeck wrote:
> On Thu, Oct 13, 2022 at 11:03:34AM +1100, Michael Ellerman wrote:
> > Guenter Roeck  writes:
> > > On Wed, Oct 12, 2022 at 11:20:38AM -0600, Jason A. Donenfeld wrote:
> > >> 
> > >> I've also managed to not hit this bug a few times. When it triggers,
> > >> after "kprobes: kprobe jump-optimization is enabled. All kprobes are
> > >> optimized if possible.", there's a long hang - tens seconds before it
> > >> continues. When it doesn't trigger, there's no hang at that point in the
> > >> boot process.
> > >> 
> > >
> > > I managed to bisect the problem. See below for results. Reverting the
> > > offending patch fixes the problem for me.
> > 
> > Thanks.
> > 
> > This is probably down to me/us not testing with PREEMPT enabled enough.
> > 
> Not sure. My configuration has
>
> CONFIG_PREEMPT_NONE=y
> # CONFIG_PREEMPT_VOLUNTARY is not set
> # CONFIG_PREEMPT is not set

Thanks very much for helping with this. The config snippet you posted here
https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-October/249758.html
has CONFIG_PREEMPT=y. How do you turn that into a .config, olddefconfig?

I can't reproduce this so far using your config and qemu command line,
but the patch you've bisected it to definitely could cause this. I'll
keep trying...

Thanks,
Nick

[...]
> > > # first bad commit: [e485f6c751e0a969327336c635ca602feea117f0] 
> > > powerpc/64/interrupt: Fix return to masked context after hard-mask irq 
> > > becomes pending



Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag

2022-10-12 Thread Guenter Roeck

On 10/12/22 10:20, Jason A. Donenfeld wrote:

On Wed, Oct 12, 2022 at 09:44:52AM -0700, Guenter Roeck wrote:

On Wed, Oct 12, 2022 at 09:49:26AM -0600, Jason A. Donenfeld wrote:

On Wed, Oct 12, 2022 at 07:18:27AM -0700, Guenter Roeck wrote:

NIP [c0031630] .replay_soft_interrupts+0x60/0x300
LR [c0031964] .arch_local_irq_restore+0x94/0x1c0
Call Trace:
[c7df3870] [c0031964] .arch_local_irq_restore+0x94/0x1c0 
(unreliable)
[c7df38f0] [c0f8a444] .__schedule+0x664/0xa50
[c7df39d0] [c0f8a8b0] .schedule+0x80/0x140
[c7df3a50] [c092f0dc] .try_to_generate_entropy+0x118/0x174
[c7df3b40] [c092e2e4] .urandom_read_iter+0x74/0x140
[c7df3bc0] [c03b0044] .vfs_read+0x284/0x2d0
[c7df3cd0] [c03b0d2c] .ksys_read+0xdc/0x130
[c7df3d80] [c002a88c] .system_call_exception+0x19c/0x330
[c7df3e10] [c000c1d4] system_call_common+0xf4/0x258


Obviously the first couple lines of this concern me a bit. But I think
actually this might just be a catalyst for another bug. You could view
that function as basically just:

 while (something)
schedule();

And I guess in the process of calling the scheduler a lot, which toggles
interrupts a lot, something got wedged.

Curious, though, I did try to reproduce this, to no avail. My .config is
https://xn--4db.cc/rBvHWfDZ . What's yours?



Attached. My qemu command line is


Okay, thanks, I reproduced it. In this case, I suspect
try_to_generate_entropy() is just the messenger. There's an earlier
problem:

BUG: using smp_processor_id() in preemptible [] code: swapper/0/1
caller is .__flush_tlb_pending+0x40/0xf0
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.0.0-28380-gde492c83cae0-dirty #4
Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac
Call Trace:
[c44c3540] [c0f93ef0] .dump_stack_lvl+0x7c/0xc4 (unreliable)
[c44c35d0] [c0fc9550] .check_preemption_disabled+0x140/0x150
[c44c3660] [c0073dd0] .__flush_tlb_pending+0x40/0xf0
[c44c36f0] [c0334434] .__apply_to_page_range+0x764/0xa30
[c44c3840] [c006cad0] .change_memory_attr+0xf0/0x160
[c44c38d0] [c02a1d70] .bpf_prog_select_runtime+0x150/0x230
[c44c3970] [c0d405d4] .bpf_prepare_filter+0x504/0x6f0
[c44c3a30] [c0d4085c] .bpf_prog_create+0x9c/0x140
[c44c3ac0] [c2051d9c] .ptp_classifier_init+0x44/0x78
[c44c3b50] [c2050f3c] .sock_init+0xe0/0x100
[c44c3bd0] [c0010bd4] .do_one_initcall+0xa4/0x438
[c44c3cc0] [c2005008] .kernel_init_freeable+0x378/0x428
[c44c3da0] [c00113d8] .kernel_init+0x28/0x1a0
[c44c3e10] [c000ca3c] .ret_from_kernel_thread+0x58/0x60

This in turn is because __flush_tlb_pending() calls:

static inline int mm_is_thread_local(struct mm_struct *mm)
{
 return cpumask_equal(mm_cpumask(mm),
   cpumask_of(smp_processor_id()));
}

__flush_tlb_pending() has a comment about this:

  * Must be called from within some kind of spinlock/non-preempt region...
  */
void __flush_tlb_pending(struct ppc64_tlb_batch *batch)

So I guess that didn't happen for some reason? Maybe this is indicative
of some lock imbalance that then gets hit later?


I managed to bisect that problem. Unfortunately it points to the
scheduler merge. No idea what to do about that. Any idea ?
I am copying Peter and Ingo for comments.

Guenter

---
# bad: [1440f576022887004f719883acb094e7e0dd4944] Merge tag 
'mm-hotfixes-stable-2022-10-11' of 
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
# good: [4fe89d07dcc2804c8b562f6c7896a45643d34b2f] Linux 6.0
git bisect start 'HEAD' 'v6.0'
# good: [7171a8da00035e7913c3013ca5fb5beb5b8b22f0] Merge tag 'arm-dt-6.1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
git bisect good 7171a8da00035e7913c3013ca5fb5beb5b8b22f0
# good: [f01603979a4afaad7504a728918b678d572cda9e] Merge tag 
'gpio-updates-for-v6.1-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
git bisect good f01603979a4afaad7504a728918b678d572cda9e
# bad: [8aeab132e05fefc3a1a5277878629586bd7a3547] Merge tag 'for_linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
git bisect bad 8aeab132e05fefc3a1a5277878629586bd7a3547
# good: [493ffd6605b2d3d4dc7008ab927dba319f36671f] Merge tag 
'ucount-rlimits-cleanups-for-v5.19' of 
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
git bisect good 493ffd6605b2d3d4dc7008ab927dba319f36671f
# bad: [cdf072acb5baa18e5b05bdf3f13d6481f62396fc] Merge tag 'trace-v6.1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
git bisect bad cdf072acb5baa18e5b05bdf3f13d6481f62396fc
# bad: [55be6084c8e0e0ada9278c2ab60b7a584378efda] Merge tag 
'timers-core-2022-10-05' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 

Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag

2022-10-12 Thread Guenter Roeck
On Thu, Oct 13, 2022 at 11:03:34AM +1100, Michael Ellerman wrote:
> Guenter Roeck  writes:
> > On Wed, Oct 12, 2022 at 11:20:38AM -0600, Jason A. Donenfeld wrote:
> >> 
> >> I've also managed to not hit this bug a few times. When it triggers,
> >> after "kprobes: kprobe jump-optimization is enabled. All kprobes are
> >> optimized if possible.", there's a long hang - tens seconds before it
> >> continues. When it doesn't trigger, there's no hang at that point in the
> >> boot process.
> >> 
> >
> > I managed to bisect the problem. See below for results. Reverting the
> > offending patch fixes the problem for me.
> 
> Thanks.
> 
> This is probably down to me/us not testing with PREEMPT enabled enough.
> 
Not sure. My configuration has

CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set

Guenter

> cheers
> 
> > ---
> > # bad: [1440f576022887004f719883acb094e7e0dd4944] Merge tag 
> > 'mm-hotfixes-stable-2022-10-11' of 
> > git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> > # good: [4fe89d07dcc2804c8b562f6c7896a45643d34b2f] Linux 6.0
> > git bisect start 'HEAD' 'v6.0'
> > # good: [7171a8da00035e7913c3013ca5fb5beb5b8b22f0] Merge tag 'arm-dt-6.1' 
> > of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
> > git bisect good 7171a8da00035e7913c3013ca5fb5beb5b8b22f0
> > # good: [f01603979a4afaad7504a728918b678d572cda9e] Merge tag 
> > 'gpio-updates-for-v6.1-rc1' of 
> > git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
> > git bisect good f01603979a4afaad7504a728918b678d572cda9e
> > # bad: [8aeab132e05fefc3a1a5277878629586bd7a3547] Merge tag 'for_linus' of 
> > git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
> > git bisect bad 8aeab132e05fefc3a1a5277878629586bd7a3547
> > # bad: [493ffd6605b2d3d4dc7008ab927dba319f36671f] Merge tag 
> > 'ucount-rlimits-cleanups-for-v5.19' of 
> > git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
> > git bisect bad 493ffd6605b2d3d4dc7008ab927dba319f36671f
> > # good: [0e470763d84dcad27284067647dfb4b1a94dfce0] Merge tag 
> > 'efi-next-for-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
> > git bisect good 0e470763d84dcad27284067647dfb4b1a94dfce0
> > # bad: [110a58b9f91c66f743c01a2c217243d94c899c23] powerpc/boot: Explicitly 
> > disable usage of SPE instructions
> > git bisect bad 110a58b9f91c66f743c01a2c217243d94c899c23
> > # good: [fdfdcfd504933ed06eb6b4c9df21eede0e213c3e] powerpc/build: put 
> > sys_call_table in .data.rel.ro if RELOCATABLE
> > git bisect good fdfdcfd504933ed06eb6b4c9df21eede0e213c3e
> > # good: [c2e7a19827eec443a7cbe85e8d959052412d6dc3] powerpc: Use generic 
> > fallocate compatibility syscall
> > git bisect good c2e7a19827eec443a7cbe85e8d959052412d6dc3
> > # good: [56adbb7a8b6cc7fc9b940829c38494e53c9e57d1] powerpc/64/interrupt: 
> > Fix false warning in context tracking due to idle state
> > git bisect good 56adbb7a8b6cc7fc9b940829c38494e53c9e57d1
> > # bad: [754f611774e4b9357a944f5b703dd291c85161cf] powerpc/64: switch asm 
> > helpers from GOT to TOC relative addressing
> > git bisect bad 754f611774e4b9357a944f5b703dd291c85161cf
> > # bad: [f7bff6e7759b1abb59334f6448f9ef3172c4c04a] powerpc/64/interrupt: 
> > avoid BUG/WARN recursion in interrupt entry
> > git bisect bad f7bff6e7759b1abb59334f6448f9ef3172c4c04a
> > # bad: [e485f6c751e0a969327336c635ca602feea117f0] powerpc/64/interrupt: Fix 
> > return to masked context after hard-mask irq becomes pending
> > git bisect bad e485f6c751e0a969327336c635ca602feea117f0
> > # good: [799f7063c7645f9a751d17f5dfd73b952f962cd2] powerpc/64: mark irqs 
> > hard disabled in boot paca
> > git bisect good 799f7063c7645f9a751d17f5dfd73b952f962cd2
> > # first bad commit: [e485f6c751e0a969327336c635ca602feea117f0] 
> > powerpc/64/interrupt: Fix return to masked context after hard-mask irq 
> > becomes pending


Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag

2022-10-12 Thread Michael Ellerman
Guenter Roeck  writes:
> On Wed, Oct 12, 2022 at 11:20:38AM -0600, Jason A. Donenfeld wrote:
>> 
>> I've also managed to not hit this bug a few times. When it triggers,
>> after "kprobes: kprobe jump-optimization is enabled. All kprobes are
>> optimized if possible.", there's a long hang - tens seconds before it
>> continues. When it doesn't trigger, there's no hang at that point in the
>> boot process.
>> 
>
> I managed to bisect the problem. See below for results. Reverting the
> offending patch fixes the problem for me.

Thanks.

This is probably down to me/us not testing with PREEMPT enabled enough.

cheers

> ---
> # bad: [1440f576022887004f719883acb094e7e0dd4944] Merge tag 
> 'mm-hotfixes-stable-2022-10-11' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> # good: [4fe89d07dcc2804c8b562f6c7896a45643d34b2f] Linux 6.0
> git bisect start 'HEAD' 'v6.0'
> # good: [7171a8da00035e7913c3013ca5fb5beb5b8b22f0] Merge tag 'arm-dt-6.1' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
> git bisect good 7171a8da00035e7913c3013ca5fb5beb5b8b22f0
> # good: [f01603979a4afaad7504a728918b678d572cda9e] Merge tag 
> 'gpio-updates-for-v6.1-rc1' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
> git bisect good f01603979a4afaad7504a728918b678d572cda9e
> # bad: [8aeab132e05fefc3a1a5277878629586bd7a3547] Merge tag 'for_linus' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
> git bisect bad 8aeab132e05fefc3a1a5277878629586bd7a3547
> # bad: [493ffd6605b2d3d4dc7008ab927dba319f36671f] Merge tag 
> 'ucount-rlimits-cleanups-for-v5.19' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
> git bisect bad 493ffd6605b2d3d4dc7008ab927dba319f36671f
> # good: [0e470763d84dcad27284067647dfb4b1a94dfce0] Merge tag 
> 'efi-next-for-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
> git bisect good 0e470763d84dcad27284067647dfb4b1a94dfce0
> # bad: [110a58b9f91c66f743c01a2c217243d94c899c23] powerpc/boot: Explicitly 
> disable usage of SPE instructions
> git bisect bad 110a58b9f91c66f743c01a2c217243d94c899c23
> # good: [fdfdcfd504933ed06eb6b4c9df21eede0e213c3e] powerpc/build: put 
> sys_call_table in .data.rel.ro if RELOCATABLE
> git bisect good fdfdcfd504933ed06eb6b4c9df21eede0e213c3e
> # good: [c2e7a19827eec443a7cbe85e8d959052412d6dc3] powerpc: Use generic 
> fallocate compatibility syscall
> git bisect good c2e7a19827eec443a7cbe85e8d959052412d6dc3
> # good: [56adbb7a8b6cc7fc9b940829c38494e53c9e57d1] powerpc/64/interrupt: Fix 
> false warning in context tracking due to idle state
> git bisect good 56adbb7a8b6cc7fc9b940829c38494e53c9e57d1
> # bad: [754f611774e4b9357a944f5b703dd291c85161cf] powerpc/64: switch asm 
> helpers from GOT to TOC relative addressing
> git bisect bad 754f611774e4b9357a944f5b703dd291c85161cf
> # bad: [f7bff6e7759b1abb59334f6448f9ef3172c4c04a] powerpc/64/interrupt: avoid 
> BUG/WARN recursion in interrupt entry
> git bisect bad f7bff6e7759b1abb59334f6448f9ef3172c4c04a
> # bad: [e485f6c751e0a969327336c635ca602feea117f0] powerpc/64/interrupt: Fix 
> return to masked context after hard-mask irq becomes pending
> git bisect bad e485f6c751e0a969327336c635ca602feea117f0
> # good: [799f7063c7645f9a751d17f5dfd73b952f962cd2] powerpc/64: mark irqs hard 
> disabled in boot paca
> git bisect good 799f7063c7645f9a751d17f5dfd73b952f962cd2
> # first bad commit: [e485f6c751e0a969327336c635ca602feea117f0] 
> powerpc/64/interrupt: Fix return to masked context after hard-mask irq 
> becomes pending


Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag

2022-10-12 Thread Guenter Roeck
On Wed, Oct 12, 2022 at 11:20:38AM -0600, Jason A. Donenfeld wrote:
> 
> I've also managed to not hit this bug a few times. When it triggers,
> after "kprobes: kprobe jump-optimization is enabled. All kprobes are
> optimized if possible.", there's a long hang - tens seconds before it
> continues. When it doesn't trigger, there's no hang at that point in the
> boot process.
> 

I managed to bisect the problem. See below for results. Reverting the
offending patch fixes the problem for me.

Guenter

---
# bad: [1440f576022887004f719883acb094e7e0dd4944] Merge tag 
'mm-hotfixes-stable-2022-10-11' of 
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
# good: [4fe89d07dcc2804c8b562f6c7896a45643d34b2f] Linux 6.0
git bisect start 'HEAD' 'v6.0'
# good: [7171a8da00035e7913c3013ca5fb5beb5b8b22f0] Merge tag 'arm-dt-6.1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
git bisect good 7171a8da00035e7913c3013ca5fb5beb5b8b22f0
# good: [f01603979a4afaad7504a728918b678d572cda9e] Merge tag 
'gpio-updates-for-v6.1-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
git bisect good f01603979a4afaad7504a728918b678d572cda9e
# bad: [8aeab132e05fefc3a1a5277878629586bd7a3547] Merge tag 'for_linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
git bisect bad 8aeab132e05fefc3a1a5277878629586bd7a3547
# bad: [493ffd6605b2d3d4dc7008ab927dba319f36671f] Merge tag 
'ucount-rlimits-cleanups-for-v5.19' of 
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
git bisect bad 493ffd6605b2d3d4dc7008ab927dba319f36671f
# good: [0e470763d84dcad27284067647dfb4b1a94dfce0] Merge tag 
'efi-next-for-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
git bisect good 0e470763d84dcad27284067647dfb4b1a94dfce0
# bad: [110a58b9f91c66f743c01a2c217243d94c899c23] powerpc/boot: Explicitly 
disable usage of SPE instructions
git bisect bad 110a58b9f91c66f743c01a2c217243d94c899c23
# good: [fdfdcfd504933ed06eb6b4c9df21eede0e213c3e] powerpc/build: put 
sys_call_table in .data.rel.ro if RELOCATABLE
git bisect good fdfdcfd504933ed06eb6b4c9df21eede0e213c3e
# good: [c2e7a19827eec443a7cbe85e8d959052412d6dc3] powerpc: Use generic 
fallocate compatibility syscall
git bisect good c2e7a19827eec443a7cbe85e8d959052412d6dc3
# good: [56adbb7a8b6cc7fc9b940829c38494e53c9e57d1] powerpc/64/interrupt: Fix 
false warning in context tracking due to idle state
git bisect good 56adbb7a8b6cc7fc9b940829c38494e53c9e57d1
# bad: [754f611774e4b9357a944f5b703dd291c85161cf] powerpc/64: switch asm 
helpers from GOT to TOC relative addressing
git bisect bad 754f611774e4b9357a944f5b703dd291c85161cf
# bad: [f7bff6e7759b1abb59334f6448f9ef3172c4c04a] powerpc/64/interrupt: avoid 
BUG/WARN recursion in interrupt entry
git bisect bad f7bff6e7759b1abb59334f6448f9ef3172c4c04a
# bad: [e485f6c751e0a969327336c635ca602feea117f0] powerpc/64/interrupt: Fix 
return to masked context after hard-mask irq becomes pending
git bisect bad e485f6c751e0a969327336c635ca602feea117f0
# good: [799f7063c7645f9a751d17f5dfd73b952f962cd2] powerpc/64: mark irqs hard 
disabled in boot paca
git bisect good 799f7063c7645f9a751d17f5dfd73b952f962cd2
# first bad commit: [e485f6c751e0a969327336c635ca602feea117f0] 
powerpc/64/interrupt: Fix return to masked context after hard-mask irq becomes 
pending


Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag

2022-10-12 Thread Jason A. Donenfeld
On Wed, Oct 12, 2022 at 10:48:26AM -0700, Guenter Roeck wrote:
> > I've also managed to not hit this bug a few times. When it triggers,
> > after "kprobes: kprobe jump-optimization is enabled. All kprobes are
> > optimized if possible.", there's a long hang - tens seconds before it
> > continues. When it doesn't trigger, there's no hang at that point in the
> > boot process.
> > 
> 
> That probably explains why my attempts to bisect the problem were
> unsuccessful.

So I just did this:

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 2fe28eeb2f38..2d70bc09db7e 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1212,6 +1212,7 @@ static void __cold try_to_generate_entropy(void)
struct entropy_timer_state stack;
unsigned int i, num_different = 0;
unsigned long last = random_get_entropy();
+   return;

for (i = 0; i < NUM_TRIAL_SAMPLES - 1; ++i) {
stack.entropy = random_get_entropy();

And then ran it, and now we get the lockup from the idle process:

udhcpc: started, v1.33.0
udhcpc: sending discover
watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:0]
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.0.0-28380-gde492c83cae0-dirty #10
Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac
NIP:  c00300f8 LR: c00304e8 CTR: c001a410
REGS: c28c79a8 TRAP: 0900   Not tainted  
(6.0.0-28380-gde492c83cae0-dirty)
MSR:  8000b032   CR: 24088442  XER: 
IRQMASK: 0
GPR00: c00304e8 c28c7b30 c1435500 c28c79a8
GPR04: c13366c0  0010029c 
GPR08: c2d3bbb0  c2883d00 c2915500
GPR12: 44088442 c2e0 0007 02295698
GPR16: 039400e8 02295258 02295660 022953d0
GPR20: 02295b10 022b34d0 02295b38 03945500
GPR24: 03945500 0008 c2883d80 c2883d00
GPR28: c290d0c0 0001 c290d018 c290cc78
NIP [c00300f8] .replay_soft_interrupts+0x28/0x2d0
LR [c00304e8] .arch_local_irq_restore+0x148/0x1a0
Call Trace:
[c28c7b30] [c00304e8] .arch_local_irq_restore+0x148/0x1a0 
(unreliable)
[c28c7bb0] [c001a388] .arch_cpu_idle+0xb8/0x140
[c28c7c30] [c0fd4940] .default_idle_call+0x80/0xc8
[c28c7ca0] [c0148480] .do_idle+0x150/0x1a0
[c28c7d50] [c0148748] .cpu_startup_entry+0x38/0x40
[c28c7dd0] [c00113a8] .rest_init+0x168/0x170
[c28c7e60] [c2004224] .arch_post_acpi_subsys_init+0x0/0x24
[c28c7ed0] [c2004ba8] .start_kernel+0x8d0/0x924
[c28c7f90] [c000d4ac] start_here_common+0x1c/0x20
Instruction dump:
6000 6000 7c0802a6 f8010010 f821fe01 6000 6000 38610078
e92d0af8 f92101f8 3920 4803a491 <6000> 3920 e9410180 f92101b0
Kernel panic - not syncing: softlockup: hung tasks
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G L 
6.0.0-28380-gde492c83cae0-dirty #10
Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac
Call Trace:
[c28c74a0] [c0f93b90] .dump_stack_lvl+0x7c/0xc4 (unreliable)
[c28c7530] [c00d2a58] .panic+0x180/0x438
[c28c75e0] [c0232424] .watchdog_timer_fn+0x3a4/0x410
[c28c76a0] [c01cb964] .__hrtimer_run_queues+0x1f4/0x590
[c28c77a0] [c01cc354] .hrtimer_interrupt+0x134/0x300
[c28c7860] [c0021cd4] .timer_interrupt+0x1c4/0x5d0
[c28c7930] [c00302f8] .replay_soft_interrupts+0x228/0x2d0
[c28c7b30] [c00304e8] .arch_local_irq_restore+0x148/0x1a0
[c28c7bb0] [c001a388] .arch_cpu_idle+0xb8/0x140
[c28c7c30] [c0fd4940] .default_idle_call+0x80/0xc8
[c28c7ca0] [c0148480] .do_idle+0x150/0x1a0
[c28c7d50] [c0148748] .cpu_startup_entry+0x38/0x40
[c28c7dd0] [c00113a8] .rest_init+0x168/0x170
[c28c7e60] [c2004224] .arch_post_acpi_subsys_init+0x0/0x24
[c28c7ed0] [c2004ba8] .start_kernel+0x8d0/0x924
[c28c7f90] [c0


Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag

2022-10-12 Thread Guenter Roeck
On Wed, Oct 12, 2022 at 11:20:38AM -0600, Jason A. Donenfeld wrote:
> On Wed, Oct 12, 2022 at 09:44:52AM -0700, Guenter Roeck wrote:
> > On Wed, Oct 12, 2022 at 09:49:26AM -0600, Jason A. Donenfeld wrote:
> > > On Wed, Oct 12, 2022 at 07:18:27AM -0700, Guenter Roeck wrote:
> > > > NIP [c0031630] .replay_soft_interrupts+0x60/0x300
> > > > LR [c0031964] .arch_local_irq_restore+0x94/0x1c0
> > > > Call Trace:
> > > > [c7df3870] [c0031964] 
> > > > .arch_local_irq_restore+0x94/0x1c0 (unreliable)
> > > > [c7df38f0] [c0f8a444] .__schedule+0x664/0xa50
> > > > [c7df39d0] [c0f8a8b0] .schedule+0x80/0x140
> > > > [c7df3a50] [c092f0dc] 
> > > > .try_to_generate_entropy+0x118/0x174
> > > > [c7df3b40] [c092e2e4] .urandom_read_iter+0x74/0x140
> > > > [c7df3bc0] [c03b0044] .vfs_read+0x284/0x2d0
> > > > [c7df3cd0] [c03b0d2c] .ksys_read+0xdc/0x130
> > > > [c7df3d80] [c002a88c] .system_call_exception+0x19c/0x330
> > > > [c7df3e10] [c000c1d4] system_call_common+0xf4/0x258
> > > 
> > > Obviously the first couple lines of this concern me a bit. But I think
> > > actually this might just be a catalyst for another bug. You could view
> > > that function as basically just:
> > > 
> > > while (something)
> > >   schedule();
> > > 
> > > And I guess in the process of calling the scheduler a lot, which toggles
> > > interrupts a lot, something got wedged.
> > > 
> > > Curious, though, I did try to reproduce this, to no avail. My .config is
> > > https://xn--4db.cc/rBvHWfDZ . What's yours?
> > > 
> > 
> > Attached. My qemu command line is
> 
> Okay, thanks, I reproduced it. In this case, I suspect
> try_to_generate_entropy() is just the messenger. There's an earlier
> problem:
> 

That problem is not new but has existed for a couple of releases, and has
never caused a hang until now.

> BUG: using smp_processor_id() in preemptible [] code: swapper/0/1
> caller is .__flush_tlb_pending+0x40/0xf0
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.0.0-28380-gde492c83cae0-dirty #4
> Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac
> Call Trace:
> [c44c3540] [c0f93ef0] .dump_stack_lvl+0x7c/0xc4 (unreliable)
> [c44c35d0] [c0fc9550] .check_preemption_disabled+0x140/0x150
> [c44c3660] [c0073dd0] .__flush_tlb_pending+0x40/0xf0
> [c44c36f0] [c0334434] .__apply_to_page_range+0x764/0xa30
> [c44c3840] [c006cad0] .change_memory_attr+0xf0/0x160
> [c44c38d0] [c02a1d70] .bpf_prog_select_runtime+0x150/0x230
> [c44c3970] [c0d405d4] .bpf_prepare_filter+0x504/0x6f0
> [c44c3a30] [c0d4085c] .bpf_prog_create+0x9c/0x140
> [c44c3ac0] [c2051d9c] .ptp_classifier_init+0x44/0x78
> [c44c3b50] [c2050f3c] .sock_init+0xe0/0x100
> [c44c3bd0] [c0010bd4] .do_one_initcall+0xa4/0x438
> [c44c3cc0] [c2005008] .kernel_init_freeable+0x378/0x428
> [c44c3da0] [c00113d8] .kernel_init+0x28/0x1a0
> [c44c3e10] [c000ca3c] .ret_from_kernel_thread+0x58/0x60
> 
> This in turn is because __flush_tlb_pending() calls:
> 
> static inline int mm_is_thread_local(struct mm_struct *mm)
> {
> return cpumask_equal(mm_cpumask(mm),
>   cpumask_of(smp_processor_id()));
> }
> 
> __flush_tlb_pending() has a comment about this:
> 
>  * Must be called from within some kind of spinlock/non-preempt region...
>  */
> void __flush_tlb_pending(struct ppc64_tlb_batch *batch)
> 
> So I guess that didn't happen for some reason? Maybe this is indicative
> of some lock imbalance that then gets hit later?
> 
> I've also managed to not hit this bug a few times. When it triggers,
> after "kprobes: kprobe jump-optimization is enabled. All kprobes are
> optimized if possible.", there's a long hang - tens seconds before it
> continues. When it doesn't trigger, there's no hang at that point in the
> boot process.
> 

That probably explains why my attempts to bisect the problem were
unsuccessful.

Thanks,
Guenter


Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag

2022-10-12 Thread Jason A. Donenfeld
On Wed, Oct 12, 2022 at 09:44:52AM -0700, Guenter Roeck wrote:
> On Wed, Oct 12, 2022 at 09:49:26AM -0600, Jason A. Donenfeld wrote:
> > On Wed, Oct 12, 2022 at 07:18:27AM -0700, Guenter Roeck wrote:
> > > NIP [c0031630] .replay_soft_interrupts+0x60/0x300
> > > LR [c0031964] .arch_local_irq_restore+0x94/0x1c0
> > > Call Trace:
> > > [c7df3870] [c0031964] .arch_local_irq_restore+0x94/0x1c0 
> > > (unreliable)
> > > [c7df38f0] [c0f8a444] .__schedule+0x664/0xa50
> > > [c7df39d0] [c0f8a8b0] .schedule+0x80/0x140
> > > [c7df3a50] [c092f0dc] .try_to_generate_entropy+0x118/0x174
> > > [c7df3b40] [c092e2e4] .urandom_read_iter+0x74/0x140
> > > [c7df3bc0] [c03b0044] .vfs_read+0x284/0x2d0
> > > [c7df3cd0] [c03b0d2c] .ksys_read+0xdc/0x130
> > > [c7df3d80] [c002a88c] .system_call_exception+0x19c/0x330
> > > [c7df3e10] [c000c1d4] system_call_common+0xf4/0x258
> > 
> > Obviously the first couple lines of this concern me a bit. But I think
> > actually this might just be a catalyst for another bug. You could view
> > that function as basically just:
> > 
> > while (something)
> > schedule();
> > 
> > And I guess in the process of calling the scheduler a lot, which toggles
> > interrupts a lot, something got wedged.
> > 
> > Curious, though, I did try to reproduce this, to no avail. My .config is
> > https://xn--4db.cc/rBvHWfDZ . What's yours?
> > 
> 
> Attached. My qemu command line is

Okay, thanks, I reproduced it. In this case, I suspect
try_to_generate_entropy() is just the messenger. There's an earlier
problem:

BUG: using smp_processor_id() in preemptible [] code: swapper/0/1
caller is .__flush_tlb_pending+0x40/0xf0
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.0.0-28380-gde492c83cae0-dirty #4
Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac
Call Trace:
[c44c3540] [c0f93ef0] .dump_stack_lvl+0x7c/0xc4 (unreliable)
[c44c35d0] [c0fc9550] .check_preemption_disabled+0x140/0x150
[c44c3660] [c0073dd0] .__flush_tlb_pending+0x40/0xf0
[c44c36f0] [c0334434] .__apply_to_page_range+0x764/0xa30
[c44c3840] [c006cad0] .change_memory_attr+0xf0/0x160
[c44c38d0] [c02a1d70] .bpf_prog_select_runtime+0x150/0x230
[c44c3970] [c0d405d4] .bpf_prepare_filter+0x504/0x6f0
[c44c3a30] [c0d4085c] .bpf_prog_create+0x9c/0x140
[c44c3ac0] [c2051d9c] .ptp_classifier_init+0x44/0x78
[c44c3b50] [c2050f3c] .sock_init+0xe0/0x100
[c44c3bd0] [c0010bd4] .do_one_initcall+0xa4/0x438
[c44c3cc0] [c2005008] .kernel_init_freeable+0x378/0x428
[c44c3da0] [c00113d8] .kernel_init+0x28/0x1a0
[c44c3e10] [c000ca3c] .ret_from_kernel_thread+0x58/0x60

This in turn is because __flush_tlb_pending() calls:

static inline int mm_is_thread_local(struct mm_struct *mm)
{
return cpumask_equal(mm_cpumask(mm),
  cpumask_of(smp_processor_id()));
}

__flush_tlb_pending() has a comment about this:

 * Must be called from within some kind of spinlock/non-preempt region...
 */
void __flush_tlb_pending(struct ppc64_tlb_batch *batch)

So I guess that didn't happen for some reason? Maybe this is indicative
of some lock imbalance that then gets hit later?

I've also managed to not hit this bug a few times. When it triggers,
after "kprobes: kprobe jump-optimization is enabled. All kprobes are
optimized if possible.", there's a long hang - tens seconds before it
continues. When it doesn't trigger, there's no hang at that point in the
boot process.

Jason


Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag

2022-10-12 Thread Guenter Roeck
On Wed, Oct 12, 2022 at 10:45:46AM -0600, Jason A. Donenfeld wrote:
> On Wed, Oct 12, 2022 at 09:49:26AM -0600, Jason A. Donenfeld wrote:
> > On Wed, Oct 12, 2022 at 07:18:27AM -0700, Guenter Roeck wrote:
> > > NIP [c0031630] .replay_soft_interrupts+0x60/0x300
> > > LR [c0031964] .arch_local_irq_restore+0x94/0x1c0
> > > Call Trace:
> > > [c7df3870] [c0031964] .arch_local_irq_restore+0x94/0x1c0 
> > > (unreliable)
> > > [c7df38f0] [c0f8a444] .__schedule+0x664/0xa50
> > > [c7df39d0] [c0f8a8b0] .schedule+0x80/0x140
> > > [c7df3a50] [c092f0dc] .try_to_generate_entropy+0x118/0x174
> > > [c7df3b40] [c092e2e4] .urandom_read_iter+0x74/0x140
> > > [c7df3bc0] [c03b0044] .vfs_read+0x284/0x2d0
> > > [c7df3cd0] [c03b0d2c] .ksys_read+0xdc/0x130
> > > [c7df3d80] [c002a88c] .system_call_exception+0x19c/0x330
> > > [c7df3e10] [c000c1d4] system_call_common+0xf4/0x258
> > 
> > Obviously the first couple lines of this concern me a bit. But I think
> > actually this might just be a catalyst for another bug. You could view
> > that function as basically just:
> > 
> > while (something)
> > schedule();
> > 
> > And I guess in the process of calling the scheduler a lot, which toggles
> > interrupts a lot, something got wedged.
> > 
> > Curious, though, I did try to reproduce this, to no avail. My .config is
> > https://xn--4db.cc/rBvHWfDZ . What's yours?
> 
> I also just tried using your github linux-build-test scripts as a guide
> for construction a config -- https://xn--4db.cc/B0HpEQDQ -- and loaded
> up your rootfs over sdhci and such, and still couldn't manage to
> reproduce. I tried commenting out the line "if (!bits)" in
> _credit_init_bits(), so that the rng would never initialize, so that the
> schedule() loop would just keep on running indefinitely, but still no
> dice.
> 
> But also, I'm running Linus' tree. From your log, I see
> "6.0.0-rc2-00163-ga5edf9815dd7". So maybe these bugs got fixed
> elsewhere?
> 

Blame me for not attaching the latest crash report.

Guenter

---
BUG: soft lockup - CPU#0 stuck for 23s! [dd:111]
Modules linked in:
CPU: 0 PID: 111 Comm: dd Not tainted 6.0.0-11414-g49da07006239 #1
Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac
NIP:  c0031630 LR: c0031964 CTR: 
REGS: c7d5b6a8 TRAP: 0900   Not tainted  (6.0.0-11414-g49da07006239)
MSR:  80009032   CR: 28002228  XER: 
IRQMASK: 0
GPR00: c0031964 c7d5b870 c13e5500 c7d5b6a8
GPR04: c125e1c0  c7d5b814 c291d018
GPR08: c2d4bbb8  c7356400 c2d21098
GPR12: 2800 c2e2 100d32e0 100d32b4
GPR16: 100d3301 100d32b9 100d3358 100d32bf
GPR20: 2000 100d3372 100d331e c7356c18
GPR24:  0e60 0900 0500
GPR28: 0a00 0f00 0002 0003
NIP [c0031630] .replay_soft_interrupts+0x60/0x300
LR [c0031964] .arch_local_irq_restore+0x94/0x1c0
Call Trace:
[c7d5b870] [c0031964] .arch_local_irq_restore+0x94/0x1c0 
(unreliable)
[c7d5b8f0] [c0f8bac4] .__schedule+0x664/0xa50
[c7d5b9d0] [c0f8bf30] .schedule+0x80/0x140
[c7d5ba50] [c093085c] .try_to_generate_entropy+0x118/0x174
[c7d5bb40] [c092fa64] .urandom_read_iter+0x74/0x140
[c7d5bbc0] [c03b0044] .vfs_read+0x284/0x2d0
[c7d5bcd0] [c03b0d2c] .ksys_read+0xdc/0x130
[c7d5bd80] [c002a88c] .system_call_exception+0x19c/0x330
[c7d5be10] [c000c1d4] system_call_common+0xf4/0x258
--- interrupt: c00 at 0x7fffb5c9d49c
NIP:  7fffb5c9d49c LR: 1000da90 CTR: 
REGS: c7d5be80 TRAP: 0c00   Not tainted  (6.0.0-11414-g49da07006239)
MSR:  8000f032   CR: 22002422  XER: 
IRQMASK: 0
GPR00: 0003 76dcc220 7fffb5d97300 
GPR04: 101102a0 0020  
GPR08:    
GPR12:  7fffb5e6aac0 100d32e0 100d32b4
GPR16: 100d3301 100d32b9 100d3358 100d32bf
GPR20: 2000 100d3372 100d331e 
GPR24: 7fff 100b3a9c 101102a0 0020
GPR28: 101025c0 0020  
NIP [7fffb5c9d49c] 0x7fffb5c9d49c
LR [1000da90] 0x1000da90
--- interrupt: c00
Instruction dump:
3b600500 3b800a00 3ba00f00 f8010010 f821fdc1 6000 6000 38610078
e92d0af8 f92101f8 3920 48039745 <6000> 3900 e9410180 

Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag

2022-10-12 Thread Jason A. Donenfeld
On Wed, Oct 12, 2022 at 09:49:26AM -0600, Jason A. Donenfeld wrote:
> On Wed, Oct 12, 2022 at 07:18:27AM -0700, Guenter Roeck wrote:
> > NIP [c0031630] .replay_soft_interrupts+0x60/0x300
> > LR [c0031964] .arch_local_irq_restore+0x94/0x1c0
> > Call Trace:
> > [c7df3870] [c0031964] .arch_local_irq_restore+0x94/0x1c0 
> > (unreliable)
> > [c7df38f0] [c0f8a444] .__schedule+0x664/0xa50
> > [c7df39d0] [c0f8a8b0] .schedule+0x80/0x140
> > [c7df3a50] [c092f0dc] .try_to_generate_entropy+0x118/0x174
> > [c7df3b40] [c092e2e4] .urandom_read_iter+0x74/0x140
> > [c7df3bc0] [c03b0044] .vfs_read+0x284/0x2d0
> > [c7df3cd0] [c03b0d2c] .ksys_read+0xdc/0x130
> > [c7df3d80] [c002a88c] .system_call_exception+0x19c/0x330
> > [c7df3e10] [c000c1d4] system_call_common+0xf4/0x258
> 
> Obviously the first couple lines of this concern me a bit. But I think
> actually this might just be a catalyst for another bug. You could view
> that function as basically just:
> 
> while (something)
>   schedule();
> 
> And I guess in the process of calling the scheduler a lot, which toggles
> interrupts a lot, something got wedged.
> 
> Curious, though, I did try to reproduce this, to no avail. My .config is
> https://xn--4db.cc/rBvHWfDZ . What's yours?

I also just tried using your github linux-build-test scripts as a guide
for construction a config -- https://xn--4db.cc/B0HpEQDQ -- and loaded
up your rootfs over sdhci and such, and still couldn't manage to
reproduce. I tried commenting out the line "if (!bits)" in
_credit_init_bits(), so that the rng would never initialize, so that the
schedule() loop would just keep on running indefinitely, but still no
dice.

But also, I'm running Linus' tree. From your log, I see
"6.0.0-rc2-00163-ga5edf9815dd7". So maybe these bugs got fixed
elsewhere?

Jason


Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag

2022-10-12 Thread Guenter Roeck
On Wed, Oct 12, 2022 at 09:49:26AM -0600, Jason A. Donenfeld wrote:
> On Wed, Oct 12, 2022 at 07:18:27AM -0700, Guenter Roeck wrote:
> > NIP [c0031630] .replay_soft_interrupts+0x60/0x300
> > LR [c0031964] .arch_local_irq_restore+0x94/0x1c0
> > Call Trace:
> > [c7df3870] [c0031964] .arch_local_irq_restore+0x94/0x1c0 
> > (unreliable)
> > [c7df38f0] [c0f8a444] .__schedule+0x664/0xa50
> > [c7df39d0] [c0f8a8b0] .schedule+0x80/0x140
> > [c7df3a50] [c092f0dc] .try_to_generate_entropy+0x118/0x174
> > [c7df3b40] [c092e2e4] .urandom_read_iter+0x74/0x140
> > [c7df3bc0] [c03b0044] .vfs_read+0x284/0x2d0
> > [c7df3cd0] [c03b0d2c] .ksys_read+0xdc/0x130
> > [c7df3d80] [c002a88c] .system_call_exception+0x19c/0x330
> > [c7df3e10] [c000c1d4] system_call_common+0xf4/0x258
> 
> Obviously the first couple lines of this concern me a bit. But I think
> actually this might just be a catalyst for another bug. You could view
> that function as basically just:
> 
> while (something)
>   schedule();
> 
> And I guess in the process of calling the scheduler a lot, which toggles
> interrupts a lot, something got wedged.
> 
> Curious, though, I did try to reproduce this, to no avail. My .config is
> https://xn--4db.cc/rBvHWfDZ . What's yours?
> 

Attached. My qemu command line is

qemu-system-ppc64 -M mac99 -cpu ppc64 \
 -m 1G -kernel vmlinux -snapshot -device e1000,netdev=net0 \
 -netdev user,id=net0 -device sdhci-pci -device sd-card,drive=d0 \
 -drive file=/var/cache/buildbot/ppc64/rootfs.ext2,format=raw,if=none,id=d0 
\
 -nographic -vga none -monitor null -no-reboot \
 --append "root=/dev/mmcblk0 rootwait console=tty console=ttyS0"

Qemu version is 7.0. The root file system is from
https://github.com/groeck/linux-build-test/tree/master/rootfs/ppc64

I used to have self tests enabled, but with that (specifically, with
CONFIG_STRING_SELFTEST=y) I now get a different hang, so I disabled that
for the time being.

Guenter
CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_PREEMPT=y
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_CGROUPS=y
CONFIG_MEMCG=y
CONFIG_BLK_CGROUP=y
CONFIG_CGROUP_SCHED=y
CONFIG_RT_GROUP_SCHED=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CPUSETS=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_DEBUG=y
CONFIG_NAMESPACES=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_EMBEDDED=y
CONFIG_PROFILING=y
CONFIG_PPC64=y
# CONFIG_PPC_POWERNV is not set
CONFIG_DTL=y
# CONFIG_CPU_IDLE is not set
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_BINFMT_MISC=m
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
CONFIG_XFRM_USER=m
CONFIG_XFRM_SUB_POLICY=y
CONFIG_NET_KEY=m
CONFIG_NET_KEY_MIGRATE=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
CONFIG_IP_MULTIPLE_TABLES=y
CONFIG_IP_ROUTE_MULTIPATH=y
CONFIG_IP_ROUTE_VERBOSE=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_BOOTP=y
CONFIG_IP_PNP_RARP=y
CONFIG_NET_IPIP=m
CONFIG_IP_MROUTE=y
CONFIG_IP_PIMSM_V1=y
CONFIG_IP_PIMSM_V2=y
CONFIG_SYN_COOKIES=y
CONFIG_INET_AH=m
CONFIG_INET_ESP=m
CONFIG_INET_IPCOMP=m
CONFIG_IPV6_ROUTER_PREF=y
CONFIG_INET6_AH=y
CONFIG_INET6_ESP=y
CONFIG_INET6_IPCOMP=m
CONFIG_IPV6_TUNNEL=m
CONFIG_NETFILTER=y
CONFIG_NF_CONNTRACK=m
CONFIG_NF_CONNTRACK_AMANDA=m
CONFIG_NF_CONNTRACK_FTP=m
CONFIG_NF_CONNTRACK_H323=m
CONFIG_NF_CONNTRACK_IRC=m
CONFIG_NF_CONNTRACK_NETBIOS_NS=m
CONFIG_NF_CONNTRACK_PPTP=m
CONFIG_NF_CONNTRACK_SANE=m
CONFIG_NF_CONNTRACK_SIP=m
CONFIG_NF_CONNTRACK_TFTP=m
CONFIG_NF_CT_NETLINK=m
CONFIG_NETFILTER_XT_TARGET_CLASSIFY=m
CONFIG_NETFILTER_XT_TARGET_CONNMARK=m
CONFIG_NETFILTER_XT_TARGET_DSCP=m
CONFIG_NETFILTER_XT_TARGET_MARK=m
CONFIG_NETFILTER_XT_TARGET_NFLOG=m
CONFIG_NETFILTER_XT_TARGET_NFQUEUE=m
CONFIG_NETFILTER_XT_TARGET_NOTRACK=m
CONFIG_NETFILTER_XT_TARGET_TRACE=m
CONFIG_NETFILTER_XT_TARGET_TCPMSS=m
CONFIG_NETFILTER_XT_MATCH_COMMENT=m
CONFIG_NETFILTER_XT_MATCH_CONNBYTES=m
CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=m
CONFIG_NETFILTER_XT_MATCH_CONNMARK=m
CONFIG_NETFILTER_XT_MATCH_CONNTRACK=m
CONFIG_NETFILTER_XT_MATCH_DCCP=m
CONFIG_NETFILTER_XT_MATCH_DSCP=m
CONFIG_NETFILTER_XT_MATCH_ESP=m
CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=m
CONFIG_NETFILTER_XT_MATCH_HELPER=m
CONFIG_NETFILTER_XT_MATCH_LENGTH=m
CONFIG_NETFILTER_XT_MATCH_LIMIT=m
CONFIG_NETFILTER_XT_MATCH_MAC=m
CONFIG_NETFILTER_XT_MATCH_MARK=m
CONFIG_NETFILTER_XT_MATCH_MULTIPORT=m
CONFIG_NETFILTER_XT_MATCH_POLICY=m
CONFIG_NETFILTER_XT_MATCH_PKTTYPE=m
CONFIG_NETFILTER_XT_MATCH_QUOTA=m
CONFIG_NETFILTER_XT_MATCH_REALM=m
CONFIG_NETFILTER_XT_MATCH_STATE=m
CONFIG_NETFILTER_XT_MATCH_STATISTIC=m
CONFIG_NETFILTER_XT_MATCH_STRING=m
CONFIG_NETFILTER_XT_MATCH_TCPMSS=m
CONFIG_NETFILTER_XT_MATCH_U32=m
CONFIG_IP_NF_IPTABLES=m
CONFIG_IP_NF_MATCH_AH=m
CONFIG_IP_NF_MATCH_ECN=m
CONFIG_IP_NF_MATCH_TTL=m
CONFIG_IP_NF_FILTER=m
CONFIG_IP_NF_TARGET_REJECT=m

Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag

2022-10-12 Thread Jason A. Donenfeld
On Wed, Oct 12, 2022 at 07:18:27AM -0700, Guenter Roeck wrote:
> NIP [c0031630] .replay_soft_interrupts+0x60/0x300
> LR [c0031964] .arch_local_irq_restore+0x94/0x1c0
> Call Trace:
> [c7df3870] [c0031964] .arch_local_irq_restore+0x94/0x1c0 
> (unreliable)
> [c7df38f0] [c0f8a444] .__schedule+0x664/0xa50
> [c7df39d0] [c0f8a8b0] .schedule+0x80/0x140
> [c7df3a50] [c092f0dc] .try_to_generate_entropy+0x118/0x174
> [c7df3b40] [c092e2e4] .urandom_read_iter+0x74/0x140
> [c7df3bc0] [c03b0044] .vfs_read+0x284/0x2d0
> [c7df3cd0] [c03b0d2c] .ksys_read+0xdc/0x130
> [c7df3d80] [c002a88c] .system_call_exception+0x19c/0x330
> [c7df3e10] [c000c1d4] system_call_common+0xf4/0x258

Obviously the first couple lines of this concern me a bit. But I think
actually this might just be a catalyst for another bug. You could view
that function as basically just:

while (something)
schedule();

And I guess in the process of calling the scheduler a lot, which toggles
interrupts a lot, something got wedged.

Curious, though, I did try to reproduce this, to no avail. My .config is
https://xn--4db.cc/rBvHWfDZ . What's yours?

Jason


Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag

2022-10-12 Thread Guenter Roeck
On Sun, Oct 09, 2022 at 10:01:39PM +1100, Michael Ellerman wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
> 
> Hi Linus,
> 
> Please pull powerpc updates for 6.1.
> 
> No conflicts with your tree. There will be a conflict when you merge the 
> kbuild tree, due
> to us renaming head_fsl_booke.S to head_85xx.S. The resolution is mostly 
> trivial,
> linux-next has the correct result if it's unclear.
> 

Post-merge problems are much more exciting when trying to run mac99
emulations in qemu.

Enabling KFENCE results in log messages such as


WARNING: inconsistent lock state
6.0.0-rc2-00163-ga5edf9815dd7 #1 Tainted: G N

inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
swapper/0/1 [HC0[0]:SC0[0]:HE1:SE1] takes:
c2734d68 (native_tlbie_lock){+.?.}-{2:2}, at: 
.native_hpte_updateboltedpp+0x1a4/0x600
{IN-SOFTIRQ-W} state was registered at:
  .lock_acquire+0x20c/0x520
  ._raw_spin_lock+0x4c/0x70
  .native_hpte_invalidate+0x62c/0x840
  .hash__kernel_map_pages+0x450/0x640
  .kfence_protect+0x58/0xc0
  .kfence_guarded_free+0x374/0x5a0
  .__slab_free+0x340/0x670
  .__d_free+0x2c/0x50
  .rcu_core+0x3f4/0x1750
  .__do_softirq+0x1dc/0x7dc
  .do_softirq_own_stack+0x40/0x60
  0xc775bca0
  .irq_exit+0x1e8/0x220
  .timer_interrupt+0x284/0x700
  decrementer_common_virt+0x208/0x210
irq event stamp: 243607
hardirqs last  enabled at (243607): [] 
.__slab_free+0x324/0x670
hardirqs last disabled at (243606): [] 
.__slab_free+0x1f4/0x670
softirqs last  enabled at (242982): [] 
.__do_softirq+0x7ac/0x7dc
softirqs last disabled at (242973): [] 
.do_softirq_own_stack+0x40/0x60

other info that might help us debug this:
 Possible unsafe locking scenario:

   CPU0
   
  lock(native_tlbie_lock);
  
lock(native_tlbie_lock);

 *** DEADLOCK ***

and, indeed, there appear to be various deadlocks.

I had to disable KFENCE to be able to test further (or maybe KFENCE works
and points out the soft lockup problem observed below - hard for me to
determine).

>   powerpc/pseries: Move dtl scanning and steal time accounting to pseries 
> platform

With this patch, CONFIG_DTL must be enabled if CONFIG_PPC_SPLPAR is enabled.
CONFIG_PPC_SPLPAR=y and CONFIG_DTL=n results in build failures due to

irq.c:(.text+0x2798): undefined reference to `.pseries_accumulate_stolen_time'

and many similar errors.

I had to enable CONFIG_DTL explicitly to be able to build my test images.
CONFIG_PPC_SPLPAR now depends on or requires CONFIG_DTL which in turn
depends on CONFIG_DEBUG_FS. That seems odd.

With all this worked around, I still get soft lockup problems when trying to 
boot
from SDHCI. I have not been able to bisect this problem.

BUG: soft lockup - CPU#0 stuck for 23s! [dd:111]
Modules linked in:
CPU: 0 PID: 111 Comm: dd Not tainted 6.0.0-10822-g60bb8154d1d7 #1
Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac
NIP:  c0031630 LR: c0031964 CTR: 
REGS: c7df36a8 TRAP: 0900   Not tainted  (6.0.0-10822-g60bb8154d1d7)
MSR:  8000b032   CR: 28002228  XER: 
IRQMASK: 0
GPR00: c0031964 c7df3870 c13e5500 c7df36a8
GPR04: c125dd80  c7df3814 c291d018
GPR08: c2d4bbb8  c7365100 c2d21098
GPR12: 2800 c2e2 100d32e0 100d32b4
GPR16: 100d3301 100d32b9 100d3358 100d32bf
GPR20: 2000 100d3372 100d331e c7365918
GPR24:  0e60 0900 0500
GPR28: 0a00 0f00 0002 0003
NIP [c0031630] .replay_soft_interrupts+0x60/0x300
LR [c0031964] .arch_local_irq_restore+0x94/0x1c0
Call Trace:
[c7df3870] [c0031964] .arch_local_irq_restore+0x94/0x1c0 
(unreliable)
[c7df38f0] [c0f8a444] .__schedule+0x664/0xa50
[c7df39d0] [c0f8a8b0] .schedule+0x80/0x140
[c7df3a50] [c092f0dc] .try_to_generate_entropy+0x118/0x174
[c7df3b40] [c092e2e4] .urandom_read_iter+0x74/0x140
[c7df3bc0] [c03b0044] .vfs_read+0x284/0x2d0
[c7df3cd0] [c03b0d2c] .ksys_read+0xdc/0x130
[c7df3d80] [c002a88c] .system_call_exception+0x19c/0x330
[c7df3e10] [c000c1d4] system_call_common+0xf4/0x258
--- interrupt: c00 at 0x7fff829fd49c
NIP:  7fff829fd49c LR: 1000da90 CTR: 
REGS: c7df3e80 TRAP: 0c00   Not tainted  (6.0.0-10822-g60bb8154d1d7)
MSR:  8000f032   CR: 22002422  XER: 
IRQMASK: 0
GPR00: 0003 7138df70 7fff82af7300 
GPR04: 101102a0 0020  
GPR08:    
GPR12:  7fff82bcaac0 

Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag

2022-10-11 Thread Nicholas Piggin
On Tue Oct 11, 2022 at 7:35 PM AEST, Michael Ellerman wrote:
> "Jason A. Donenfeld"  writes:
> > On Tue, Oct 11, 2022 at 12:53:17PM +1100, Michael Ellerman wrote:
> >> "Jason A. Donenfeld"  writes:
> >> > On Mon, Oct 10, 2022 at 01:25:25PM -0600, Jason A. Donenfeld wrote:
> >> >> Hi Michael,
> >> >> 
> >> >> On Sun, Oct 09, 2022 at 10:01:39PM +1100, Michael Ellerman wrote:
> >> >> > powerpc updates for 6.1
> >> >> > 
> >> >> >  - Remove our now never-true definitions for pgd_huge() and 
> >> >> > p4d_leaf().
> >> >> > 
> >> >> >  - Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit.
> >> >> > 
> >> >> >  - Add support for syscall wrappers.
> >> >> > 
> >> >> >  - Add support for KFENCE on 64-bit.
> >> >> > 
> >> >> >  - Update 64-bit HV KVM to use the new guest state entry/exit 
> >> >> > accounting API.
> >> >> > 
> >> >> >  - Support execute-only memory when using the Radix MMU (P9 or later).
> >> >> > 
> >> >> >  - Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests.
> >> >> > 
> >> >> >  - Updates to our linker script to move more data into read-only 
> >> >> > sections.
> >> >> > 
> >> >> >  - Allow the VDSO to be randomised on 32-bit.
> >> >> > 
> >> >> >  - Many other small features and fixes.
> >> >> 
> >> >> FYI, something in here broke the wireguard test suite, which runs the
> >> >> iperf3 networking utility. The full log is here [1], but the relevant 
> >> >> part
> >> >> is: 
> >> >> 
> >> >> [+] NS1: iperf3 -Z -t 3 -c 192.168.241.2
> >> >> Connecting to host 192.168.241.2, port 5201
> >> >> iperf3: error - failed to read /dev/urandom: Bad address
> >> >> 
> >> >> I'll see if I can narrow it down a bit more and bisect. But just FYI, in
> >> >> case you have an intuition.
> >> >
> >> > Huh. From iov_iter.c:
> >> >
> >> > static int copyout(void __user *to, const void *from, size_t n)
> >> > {
> >> > size_t before = n;
> >> > if (should_fail_usercopy())
> >> > return n;
> >> > if (access_ok(to, n)) {
> >> > instrument_copy_to_user(to, from, n);
> >> > n = raw_copy_to_user(to, from, n);
> >> > if (n == before)
> >> > pr_err("SARU n still %zu pointer is %lx\n", n, 
> >> > (unsigned long)to);
> >> > }
> >> > return n;
> >> > }
> >> >
> >> > I added the pr_err() there to catch the failure:
> >> > [3.443506] SARU n still 64 pointer is b78db000
> >> >
> >> > Also I managed to extract the failing portion of iperf3 into something
> >> > smaller:
> >> >
> >> > int temp;
> >> > char *x;
> >> > ssize_t l;
> >> > FILE *f;
> >> > char template[] = "/blah-XX";
> >> >
> >> > temp = mkstemp(template);
> >> > if (temp < 0)
> >> > panic("mkstemp");
> >> > if (unlink(template) < 0)
> >> > panic("unlink");
> >> > if (ftruncate(temp, 0x2) < 0)
> >> > panic("ftruncate");
> >> > x = mmap(NULL, 0x2, PROT_READ|PROT_WRITE, MAP_PRIVATE, temp, 
> >> > 0);
> >> > if (x == MAP_FAILED)
> >> > panic("mmap");
> >> > f = fopen("/dev/urandom", "rb");
> >> > if (!f)
> >> > panic("fopen");
> >> > setbuf(f, NULL);
> >> > if (fread(x, 1, 0x2, f) != 0x2)
> >> > panic("fread");
> >> 
> >> Does that fail for you reliably?
> >> 
> >> It succeeds for me running under qemu ppce500, though I'm not using your
> >> kernel config yet.
> >
> > Yes, every time without fail, across two systems and two qemu builds.
>
> OK. Joel worked out that it only fails when built with musl, so that's
> why it's succeeding for me (built with glibc).

This was independently discovered by several, but we worked out it's
because musl uses ftruncate64 here, while glibc doesn't seem to. And
ftruncate64 got broken by the syscall wrappers patch on ppc32. The
kernel is seeing a 0 length ftruncate call, so the user access sigbuses
and can't copy anything.

This quick hack gets the test program working again. Only very lightly
tested so far...

Thanks,
Nick

---
diff --git a/arch/powerpc/include/asm/syscalls.h 
b/arch/powerpc/include/asm/syscalls.h
index 9840d572da55..9578cc5e4f84 100644
--- a/arch/powerpc/include/asm/syscalls.h
+++ b/arch/powerpc/include/asm/syscalls.h
@@ -89,6 +89,27 @@ long compat_sys_rt_sigreturn(void);
  * responsible for combining parameter pairs.
  */
 
+#ifdef CONFIG_PPC32
+long sys_ppc_pread64(unsigned int fd,
+char __user *ubuf, compat_size_t count,
+u32 reg6, u32 pos1, u32 pos2);
+long sys_ppc_pwrite64(unsigned int fd,
+ const char __user *ubuf, compat_size_t count,
+ u32 reg6, u32 pos1, u32 pos2);
+long sys_ppc_readahead(int fd, u32 r4,
+  u32 offset1, u32 offset2, u32 count);
+long sys_ppc_truncate64(const char __user *path, u32 reg4,
+   

Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag

2022-10-11 Thread Michael Ellerman
"Jason A. Donenfeld"  writes:
> On Tue, Oct 11, 2022 at 12:53:17PM +1100, Michael Ellerman wrote:
>> "Jason A. Donenfeld"  writes:
>> > On Mon, Oct 10, 2022 at 01:25:25PM -0600, Jason A. Donenfeld wrote:
>> >> Hi Michael,
>> >> 
>> >> On Sun, Oct 09, 2022 at 10:01:39PM +1100, Michael Ellerman wrote:
>> >> > powerpc updates for 6.1
>> >> > 
>> >> >  - Remove our now never-true definitions for pgd_huge() and p4d_leaf().
>> >> > 
>> >> >  - Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit.
>> >> > 
>> >> >  - Add support for syscall wrappers.
>> >> > 
>> >> >  - Add support for KFENCE on 64-bit.
>> >> > 
>> >> >  - Update 64-bit HV KVM to use the new guest state entry/exit 
>> >> > accounting API.
>> >> > 
>> >> >  - Support execute-only memory when using the Radix MMU (P9 or later).
>> >> > 
>> >> >  - Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests.
>> >> > 
>> >> >  - Updates to our linker script to move more data into read-only 
>> >> > sections.
>> >> > 
>> >> >  - Allow the VDSO to be randomised on 32-bit.
>> >> > 
>> >> >  - Many other small features and fixes.
>> >> 
>> >> FYI, something in here broke the wireguard test suite, which runs the
>> >> iperf3 networking utility. The full log is here [1], but the relevant part
>> >> is: 
>> >> 
>> >> [+] NS1: iperf3 -Z -t 3 -c 192.168.241.2
>> >> Connecting to host 192.168.241.2, port 5201
>> >> iperf3: error - failed to read /dev/urandom: Bad address
>> >> 
>> >> I'll see if I can narrow it down a bit more and bisect. But just FYI, in
>> >> case you have an intuition.
>> >
>> > Huh. From iov_iter.c:
>> >
>> > static int copyout(void __user *to, const void *from, size_t n)
>> > {
>> > size_t before = n;
>> > if (should_fail_usercopy())
>> > return n;
>> > if (access_ok(to, n)) {
>> > instrument_copy_to_user(to, from, n);
>> > n = raw_copy_to_user(to, from, n);
>> > if (n == before)
>> > pr_err("SARU n still %zu pointer is %lx\n", n, 
>> > (unsigned long)to);
>> > }
>> > return n;
>> > }
>> >
>> > I added the pr_err() there to catch the failure:
>> > [3.443506] SARU n still 64 pointer is b78db000
>> >
>> > Also I managed to extract the failing portion of iperf3 into something
>> > smaller:
>> >
>> > int temp;
>> > char *x;
>> > ssize_t l;
>> > FILE *f;
>> > char template[] = "/blah-XX";
>> >
>> > temp = mkstemp(template);
>> > if (temp < 0)
>> > panic("mkstemp");
>> > if (unlink(template) < 0)
>> > panic("unlink");
>> > if (ftruncate(temp, 0x2) < 0)
>> > panic("ftruncate");
>> > x = mmap(NULL, 0x2, PROT_READ|PROT_WRITE, MAP_PRIVATE, temp, 
>> > 0);
>> > if (x == MAP_FAILED)
>> > panic("mmap");
>> > f = fopen("/dev/urandom", "rb");
>> > if (!f)
>> > panic("fopen");
>> > setbuf(f, NULL);
>> > if (fread(x, 1, 0x2, f) != 0x2)
>> > panic("fread");
>> 
>> Does that fail for you reliably?
>> 
>> It succeeds for me running under qemu ppce500, though I'm not using your
>> kernel config yet.
>
> Yes, every time without fail, across two systems and two qemu builds.

OK. Joel worked out that it only fails when built with musl, so that's
why it's succeeding for me (built with glibc).

cheers


Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag

2022-10-11 Thread Michael Ellerman
"Jason A. Donenfeld"  writes:
> On Tue, Oct 11, 2022 at 12:44:20PM +1100, Michael Ellerman wrote:
>> "Jason A. Donenfeld"  writes:
>> > Hi Andrew,
>> >
>> > On Tue, Oct 11, 2022 at 11:00:15AM +1100, Andrew Donnellan wrote:
>> >> Thanks for bisecting, this is interesting! Could you provide your
>> >> .config and the environment you're running in? Your reproducer doesn't
>> >> seem to trigger it on my baremetal POWER8 pseries_le_defconfig.
>> >
>> > Sure.
>> >
>> > .config: https://xn--4db.cc/NemFt2Vs (change CONFIG_INITRAMFS_SOURCE)
>> > Toolchain: 
>> > https://download.wireguard.com/qemu-test/toolchains/20211123/powerpc-linux-musl-cross.tgz
>> >
>> > You can also just run:
>> >
>> >   ARCH=powerpc make -C tools/testing/selftests/wireguard/qemu -j$(nproc)
>> >
>> > And that'll assemble the whole thing.
>> 
>> I tried that :)
>> 
>> What host OS are you running that on?
>> 
>> I get:
>> 
>>   mkdir -p 
>> /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc
>>   powerpc-linux-musl-gcc -o 
>> /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/init
>>  -O3 -pipe  -std=gnu11 init.c
>>   
>> /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld:
>>  cannot find Scrt1.o: No such file or directory
>>   
>> /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld:
>>  cannot find crti.o: No such file or directory
>>   
>> /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld:
>>  cannot find crtbeginS.o: No such file or directory
>>   
>> /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld:
>>  cannot find -lgcc
>>   
>> /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld:
>>  cannot find -lgcc
>>   collect2: error: ld returned 1 exit status
>
> Here's what happened:
>
> - You started the thing and the kernel compile complained about an
>   unclean tree.
> - You ran mrproper.
> - You tried to run the thing again.
>
> amirite?

I think so yeah. I tried it on 3 different machines so I'm not sure
exactly what I did where, but I definitely ran mrproper on one of them.

> If so, what happened is that mrproper deleted the .o files from the
> toolchain. Solution:
>
>   ARCH=powerpc make -C tools/testing/selftests/wireguard/qemu -j$(nproc) clean
>   ARCH=powerpc make -C tools/testing/selftests/wireguard/qemu -j$(nproc)
>
> Let me know how that goes.

Yep that works thanks.

And I see the iperf failure. Though I still can't see what the bug is,
but hopefully if I stare at it longer I'll work it out.

cheers


Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag

2022-10-10 Thread Jason A. Donenfeld
On Tue, Oct 11, 2022 at 12:53:17PM +1100, Michael Ellerman wrote:
> "Jason A. Donenfeld"  writes:
> > On Mon, Oct 10, 2022 at 01:25:25PM -0600, Jason A. Donenfeld wrote:
> >> Hi Michael,
> >> 
> >> On Sun, Oct 09, 2022 at 10:01:39PM +1100, Michael Ellerman wrote:
> >> > powerpc updates for 6.1
> >> > 
> >> >  - Remove our now never-true definitions for pgd_huge() and p4d_leaf().
> >> > 
> >> >  - Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit.
> >> > 
> >> >  - Add support for syscall wrappers.
> >> > 
> >> >  - Add support for KFENCE on 64-bit.
> >> > 
> >> >  - Update 64-bit HV KVM to use the new guest state entry/exit accounting 
> >> > API.
> >> > 
> >> >  - Support execute-only memory when using the Radix MMU (P9 or later).
> >> > 
> >> >  - Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests.
> >> > 
> >> >  - Updates to our linker script to move more data into read-only 
> >> > sections.
> >> > 
> >> >  - Allow the VDSO to be randomised on 32-bit.
> >> > 
> >> >  - Many other small features and fixes.
> >> 
> >> FYI, something in here broke the wireguard test suite, which runs the
> >> iperf3 networking utility. The full log is here [1], but the relevant part
> >> is: 
> >> 
> >> [+] NS1: iperf3 -Z -t 3 -c 192.168.241.2
> >> Connecting to host 192.168.241.2, port 5201
> >> iperf3: error - failed to read /dev/urandom: Bad address
> >> 
> >> I'll see if I can narrow it down a bit more and bisect. But just FYI, in
> >> case you have an intuition.
> >
> > Huh. From iov_iter.c:
> >
> > static int copyout(void __user *to, const void *from, size_t n)
> > {
> > size_t before = n;
> > if (should_fail_usercopy())
> > return n;
> > if (access_ok(to, n)) {
> > instrument_copy_to_user(to, from, n);
> > n = raw_copy_to_user(to, from, n);
> > if (n == before)
> > pr_err("SARU n still %zu pointer is %lx\n", n, 
> > (unsigned long)to);
> > }
> > return n;
> > }
> >
> > I added the pr_err() there to catch the failure:
> > [3.443506] SARU n still 64 pointer is b78db000
> >
> > Also I managed to extract the failing portion of iperf3 into something
> > smaller:
> >
> > int temp;
> > char *x;
> > ssize_t l;
> > FILE *f;
> > char template[] = "/blah-XX";
> >
> > temp = mkstemp(template);
> > if (temp < 0)
> > panic("mkstemp");
> > if (unlink(template) < 0)
> > panic("unlink");
> > if (ftruncate(temp, 0x2) < 0)
> > panic("ftruncate");
> > x = mmap(NULL, 0x2, PROT_READ|PROT_WRITE, MAP_PRIVATE, temp, 0);
> > if (x == MAP_FAILED)
> > panic("mmap");
> > f = fopen("/dev/urandom", "rb");
> > if (!f)
> > panic("fopen");
> > setbuf(f, NULL);
> > if (fread(x, 1, 0x2, f) != 0x2)
> > panic("fread");
> 
> Does that fail for you reliably?
> 
> It succeeds for me running under qemu ppce500, though I'm not using your
> kernel config yet.

Yes, every time without fail, across two systems and two qemu builds.

Jason


Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag

2022-10-10 Thread Jason A. Donenfeld
On Tue, Oct 11, 2022 at 12:44:20PM +1100, Michael Ellerman wrote:
> "Jason A. Donenfeld"  writes:
> > Hi Andrew,
> >
> > On Tue, Oct 11, 2022 at 11:00:15AM +1100, Andrew Donnellan wrote:
> >> Thanks for bisecting, this is interesting! Could you provide your
> >> .config and the environment you're running in? Your reproducer doesn't
> >> seem to trigger it on my baremetal POWER8 pseries_le_defconfig.
> >
> > Sure.
> >
> > .config: https://xn--4db.cc/NemFt2Vs (change CONFIG_INITRAMFS_SOURCE)
> > Toolchain: 
> > https://download.wireguard.com/qemu-test/toolchains/20211123/powerpc-linux-musl-cross.tgz
> >
> > You can also just run:
> >
> >   ARCH=powerpc make -C tools/testing/selftests/wireguard/qemu -j$(nproc)
> >
> > And that'll assemble the whole thing.
> 
> I tried that :)
> 
> What host OS are you running that on?
> 
> I get:
> 
>   mkdir -p 
> /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc
>   powerpc-linux-musl-gcc -o 
> /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/init
>  -O3 -pipe  -std=gnu11 init.c
>   
> /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld:
>  cannot find Scrt1.o: No such file or directory
>   
> /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld:
>  cannot find crti.o: No such file or directory
>   
> /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld:
>  cannot find crtbeginS.o: No such file or directory
>   
> /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld:
>  cannot find -lgcc
>   
> /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld:
>  cannot find -lgcc
>   collect2: error: ld returned 1 exit status

Here's what happened:

- You started the thing and the kernel compile complained about an
  unclean tree.
- You ran mrproper.
- You tried to run the thing again.

amirite?

If so, what happened is that mrproper deleted the .o files from the
toolchain. Solution:

  ARCH=powerpc make -C tools/testing/selftests/wireguard/qemu -j$(nproc) clean
  ARCH=powerpc make -C tools/testing/selftests/wireguard/qemu -j$(nproc)

Let me know how that goes.

Jason


Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag

2022-10-10 Thread Michael Ellerman
"Jason A. Donenfeld"  writes:
> On Mon, Oct 10, 2022 at 01:25:25PM -0600, Jason A. Donenfeld wrote:
>> Hi Michael,
>> 
>> On Sun, Oct 09, 2022 at 10:01:39PM +1100, Michael Ellerman wrote:
>> > powerpc updates for 6.1
>> > 
>> >  - Remove our now never-true definitions for pgd_huge() and p4d_leaf().
>> > 
>> >  - Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit.
>> > 
>> >  - Add support for syscall wrappers.
>> > 
>> >  - Add support for KFENCE on 64-bit.
>> > 
>> >  - Update 64-bit HV KVM to use the new guest state entry/exit accounting 
>> > API.
>> > 
>> >  - Support execute-only memory when using the Radix MMU (P9 or later).
>> > 
>> >  - Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests.
>> > 
>> >  - Updates to our linker script to move more data into read-only sections.
>> > 
>> >  - Allow the VDSO to be randomised on 32-bit.
>> > 
>> >  - Many other small features and fixes.
>> 
>> FYI, something in here broke the wireguard test suite, which runs the
>> iperf3 networking utility. The full log is here [1], but the relevant part
>> is: 
>> 
>> [+] NS1: iperf3 -Z -t 3 -c 192.168.241.2
>> Connecting to host 192.168.241.2, port 5201
>> iperf3: error - failed to read /dev/urandom: Bad address
>> 
>> I'll see if I can narrow it down a bit more and bisect. But just FYI, in
>> case you have an intuition.
>
> Huh. From iov_iter.c:
>
> static int copyout(void __user *to, const void *from, size_t n)
> {
> size_t before = n;
> if (should_fail_usercopy())
> return n;
> if (access_ok(to, n)) {
> instrument_copy_to_user(to, from, n);
> n = raw_copy_to_user(to, from, n);
> if (n == before)
> pr_err("SARU n still %zu pointer is %lx\n", n, 
> (unsigned long)to);
> }
> return n;
> }
>
> I added the pr_err() there to catch the failure:
> [3.443506] SARU n still 64 pointer is b78db000
>
> Also I managed to extract the failing portion of iperf3 into something
> smaller:
>
> int temp;
> char *x;
> ssize_t l;
> FILE *f;
> char template[] = "/blah-XX";
>
> temp = mkstemp(template);
> if (temp < 0)
> panic("mkstemp");
> if (unlink(template) < 0)
> panic("unlink");
> if (ftruncate(temp, 0x2) < 0)
> panic("ftruncate");
> x = mmap(NULL, 0x2, PROT_READ|PROT_WRITE, MAP_PRIVATE, temp, 0);
> if (x == MAP_FAILED)
> panic("mmap");
> f = fopen("/dev/urandom", "rb");
> if (!f)
> panic("fopen");
> setbuf(f, NULL);
> if (fread(x, 1, 0x2, f) != 0x2)
> panic("fread");

Does that fail for you reliably?

It succeeds for me running under qemu ppce500, though I'm not using your
kernel config yet.

cheers


Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag

2022-10-10 Thread Michael Ellerman
"Jason A. Donenfeld"  writes:
> Hi Andrew,
>
> On Tue, Oct 11, 2022 at 11:00:15AM +1100, Andrew Donnellan wrote:
>> Thanks for bisecting, this is interesting! Could you provide your
>> .config and the environment you're running in? Your reproducer doesn't
>> seem to trigger it on my baremetal POWER8 pseries_le_defconfig.
>
> Sure.
>
> .config: https://xn--4db.cc/NemFt2Vs (change CONFIG_INITRAMFS_SOURCE)
> Toolchain: 
> https://download.wireguard.com/qemu-test/toolchains/20211123/powerpc-linux-musl-cross.tgz
>
> You can also just run:
>
>   ARCH=powerpc make -C tools/testing/selftests/wireguard/qemu -j$(nproc)
>
> And that'll assemble the whole thing.

I tried that :)

What host OS are you running that on?

I get:

  mkdir -p 
/scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc
  powerpc-linux-musl-gcc -o 
/scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/init
 -O3 -pipe  -std=gnu11 init.c
  
/scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld:
 cannot find Scrt1.o: No such file or directory
  
/scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld:
 cannot find crti.o: No such file or directory
  
/scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld:
 cannot find crtbeginS.o: No such file or directory
  
/scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld:
 cannot find -lgcc
  
/scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld:
 cannot find -lgcc
  collect2: error: ld returned 1 exit status

cheers


Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag

2022-10-10 Thread Jason A. Donenfeld
Hi Andrew,

On Tue, Oct 11, 2022 at 11:00:15AM +1100, Andrew Donnellan wrote:
> Thanks for bisecting, this is interesting! Could you provide your
> .config and the environment you're running in? Your reproducer doesn't
> seem to trigger it on my baremetal POWER8 pseries_le_defconfig.

Sure.

.config: https://xn--4db.cc/NemFt2Vs (change CONFIG_INITRAMFS_SOURCE)
Toolchain: 
https://download.wireguard.com/qemu-test/toolchains/20211123/powerpc-linux-musl-cross.tgz

You can also just run:

  ARCH=powerpc make -C tools/testing/selftests/wireguard/qemu -j$(nproc)

And that'll assemble the whole thing.

Jason


Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag

2022-10-10 Thread Andrew Donnellan
On Mon, 2022-10-10 at 16:26 -0600, Jason A. Donenfeld wrote:
> 
> Bisected:
> 
> 7e92e01b724526b98cbc7f03dd4afa0295780d56 is the first bad commit
> commit 7e92e01b724526b98cbc7f03dd4afa0295780d56
> Author: Rohan McLure 
> Date:   Wed Sep 21 16:56:01 2022 +1000
> 
>     powerpc: Provide syscall wrapper
> 
>     Implement syscall wrapper as per s390, x86, arm64. When enabled
>     cause handlers to accept parameters from a stack frame rather
> than
>     from user scratch register state. This allows for user registers
> to be
>     safely cleared in order to reduce caller influence on speculation
>     within syscall routine. The wrapper is a macro that emits syscall
>     handler symbols that call into the target handler, obtaining its
>     parameters from a struct pt_regs on the stack.
> 
>     As registers are already saved to the stack prior to calling
>     system_call_exception, it appears that this function is executed
> more
>     efficiently with the new stack-pointer convention than with
> parameters
>     passed by registers, avoiding the allocation of a stack frame for
> this
>     method. On a 32-bit system, we see >20% performance increases on
> the
>     null_syscall microbenchmark, and on a Power 8 the performance
> gains
>     amortise the cost of clearing and restoring registers which is
>     implemented at the end of this series, seeing final result of
> ~5.6%
>     performance improvement on null_syscall.
> 
>     Syscalls are wrapped in this fashion on all platforms except for
> the
>     Cell processor as this commit does not provide SPU support. This
> can be
>     quickly fixed in a successive patch, but requires
> spu_sys_callback to
>     allocate a pt_regs structure to satisfy the wrapped calling
> convention.
> 
>     Co-developed-by: Andrew Donnellan 
>     Signed-off-by: Andrew Donnellan 
>     Signed-off-by: Rohan McLure 
>     Reviewed-by: Nicholas Piggin 
>     [mpe: Make incompatible with COMPAT to retain clearing of high
> bits of args]
>     Signed-off-by: Michael Ellerman 
>     Link:
> https://lore.kernel.org/r/20220921065605.1051927-22-rmcl...@linux.ibm.com

Thanks for bisecting, this is interesting! Could you provide your
.config and the environment you're running in? Your reproducer doesn't
seem to trigger it on my baremetal POWER8 pseries_le_defconfig.

-- 
Andrew DonnellanOzLabs, ADL Canberra
a...@linux.ibm.com   IBM Australia Limited



Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag

2022-10-10 Thread Jason A. Donenfeld
On Mon, Oct 10, 2022 at 02:03:09PM -0600, Jason A. Donenfeld wrote:
> On Mon, Oct 10, 2022 at 01:25:25PM -0600, Jason A. Donenfeld wrote:
> > Hi Michael,
> > 
> > On Sun, Oct 09, 2022 at 10:01:39PM +1100, Michael Ellerman wrote:
> > > powerpc updates for 6.1
> > > 
> > >  - Remove our now never-true definitions for pgd_huge() and p4d_leaf().
> > > 
> > >  - Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit.
> > > 
> > >  - Add support for syscall wrappers.
> > > 
> > >  - Add support for KFENCE on 64-bit.
> > > 
> > >  - Update 64-bit HV KVM to use the new guest state entry/exit accounting 
> > > API.
> > > 
> > >  - Support execute-only memory when using the Radix MMU (P9 or later).
> > > 
> > >  - Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests.
> > > 
> > >  - Updates to our linker script to move more data into read-only sections.
> > > 
> > >  - Allow the VDSO to be randomised on 32-bit.
> > > 
> > >  - Many other small features and fixes.
> > 
> > FYI, something in here broke the wireguard test suite, which runs the
> > iperf3 networking utility. The full log is here [1], but the relevant part
> > is: 
> > 
> > [+] NS1: iperf3 -Z -t 3 -c 192.168.241.2
> > Connecting to host 192.168.241.2, port 5201
> > iperf3: error - failed to read /dev/urandom: Bad address
> > 
> > I'll see if I can narrow it down a bit more and bisect. But just FYI, in
> > case you have an intuition.
> 
> Huh. From iov_iter.c:
> 
> static int copyout(void __user *to, const void *from, size_t n)
> {
> size_t before = n;
> if (should_fail_usercopy())
> return n;
> if (access_ok(to, n)) {
> instrument_copy_to_user(to, from, n);
> n = raw_copy_to_user(to, from, n);
> if (n == before)
> pr_err("SARU n still %zu pointer is %lx\n", n, 
> (unsigned long)to);
> }
> return n;
> }
> 
> I added the pr_err() there to catch the failure:
> [3.443506] SARU n still 64 pointer is b78db000
> 
> Also I managed to extract the failing portion of iperf3 into something
> smaller:
> 
> int temp;
> char *x;
> ssize_t l;
> FILE *f;
> char template[] = "/blah-XX";
> 
> temp = mkstemp(template);
> if (temp < 0)
> panic("mkstemp");
> if (unlink(template) < 0)
> panic("unlink");
> if (ftruncate(temp, 0x2) < 0)
> panic("ftruncate");
> x = mmap(NULL, 0x2, PROT_READ|PROT_WRITE, MAP_PRIVATE, temp, 0);
> if (x == MAP_FAILED)
> panic("mmap");
> f = fopen("/dev/urandom", "rb");
> if (!f)
> panic("fopen");
> setbuf(f, NULL);
> if (fread(x, 1, 0x2, f) != 0x2)
> panic("fread");
> 
> Jason

Bisected:

7e92e01b724526b98cbc7f03dd4afa0295780d56 is the first bad commit
commit 7e92e01b724526b98cbc7f03dd4afa0295780d56
Author: Rohan McLure 
Date:   Wed Sep 21 16:56:01 2022 +1000

powerpc: Provide syscall wrapper

Implement syscall wrapper as per s390, x86, arm64. When enabled
cause handlers to accept parameters from a stack frame rather than
from user scratch register state. This allows for user registers to be
safely cleared in order to reduce caller influence on speculation
within syscall routine. The wrapper is a macro that emits syscall
handler symbols that call into the target handler, obtaining its
parameters from a struct pt_regs on the stack.

As registers are already saved to the stack prior to calling
system_call_exception, it appears that this function is executed more
efficiently with the new stack-pointer convention than with parameters
passed by registers, avoiding the allocation of a stack frame for this
method. On a 32-bit system, we see >20% performance increases on the
null_syscall microbenchmark, and on a Power 8 the performance gains
amortise the cost of clearing and restoring registers which is
implemented at the end of this series, seeing final result of ~5.6%
performance improvement on null_syscall.

Syscalls are wrapped in this fashion on all platforms except for the
Cell processor as this commit does not provide SPU support. This can be
quickly fixed in a successive patch, but requires spu_sys_callback to
allocate a pt_regs structure to satisfy the wrapped calling convention.

Co-developed-by: Andrew Donnellan 
Signed-off-by: Andrew Donnellan 
Signed-off-by: Rohan McLure 
Reviewed-by: Nicholas Piggin 
[mpe: Make incompatible with COMPAT to retain clearing of high bits of args]
Signed-off-by: Michael Ellerman 
Link: 
https://lore.kernel.org/r/20220921065605.1051927-22-rmcl...@linux.ibm.com

 arch/powerpc/Kconfig   |  1 +
 arch/powerpc/include/asm/syscall.h |  4 +++
 arch/powerpc/include/asm/syscall_wrapper.h | 51 

Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag

2022-10-10 Thread Jason A. Donenfeld
On Mon, Oct 10, 2022 at 01:25:25PM -0600, Jason A. Donenfeld wrote:
> Hi Michael,
> 
> On Sun, Oct 09, 2022 at 10:01:39PM +1100, Michael Ellerman wrote:
> > powerpc updates for 6.1
> > 
> >  - Remove our now never-true definitions for pgd_huge() and p4d_leaf().
> > 
> >  - Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit.
> > 
> >  - Add support for syscall wrappers.
> > 
> >  - Add support for KFENCE on 64-bit.
> > 
> >  - Update 64-bit HV KVM to use the new guest state entry/exit accounting 
> > API.
> > 
> >  - Support execute-only memory when using the Radix MMU (P9 or later).
> > 
> >  - Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests.
> > 
> >  - Updates to our linker script to move more data into read-only sections.
> > 
> >  - Allow the VDSO to be randomised on 32-bit.
> > 
> >  - Many other small features and fixes.
> 
> FYI, something in here broke the wireguard test suite, which runs the
> iperf3 networking utility. The full log is here [1], but the relevant part
> is: 
> 
> [+] NS1: iperf3 -Z -t 3 -c 192.168.241.2
> Connecting to host 192.168.241.2, port 5201
> iperf3: error - failed to read /dev/urandom: Bad address
> 
> I'll see if I can narrow it down a bit more and bisect. But just FYI, in
> case you have an intuition.

Huh. From iov_iter.c:

static int copyout(void __user *to, const void *from, size_t n)
{
size_t before = n;
if (should_fail_usercopy())
return n;
if (access_ok(to, n)) {
instrument_copy_to_user(to, from, n);
n = raw_copy_to_user(to, from, n);
if (n == before)
pr_err("SARU n still %zu pointer is %lx\n", n, 
(unsigned long)to);
}
return n;
}

I added the pr_err() there to catch the failure:
[3.443506] SARU n still 64 pointer is b78db000

Also I managed to extract the failing portion of iperf3 into something
smaller:

int temp;
char *x;
ssize_t l;
FILE *f;
char template[] = "/blah-XX";

temp = mkstemp(template);
if (temp < 0)
panic("mkstemp");
if (unlink(template) < 0)
panic("unlink");
if (ftruncate(temp, 0x2) < 0)
panic("ftruncate");
x = mmap(NULL, 0x2, PROT_READ|PROT_WRITE, MAP_PRIVATE, temp, 0);
if (x == MAP_FAILED)
panic("mmap");
f = fopen("/dev/urandom", "rb");
if (!f)
panic("fopen");
setbuf(f, NULL);
if (fread(x, 1, 0x2, f) != 0x2)
panic("fread");

Jason


Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag

2022-10-10 Thread Jason A. Donenfeld
Hi Michael,

On Sun, Oct 09, 2022 at 10:01:39PM +1100, Michael Ellerman wrote:
> powerpc updates for 6.1
> 
>  - Remove our now never-true definitions for pgd_huge() and p4d_leaf().
> 
>  - Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit.
> 
>  - Add support for syscall wrappers.
> 
>  - Add support for KFENCE on 64-bit.
> 
>  - Update 64-bit HV KVM to use the new guest state entry/exit accounting API.
> 
>  - Support execute-only memory when using the Radix MMU (P9 or later).
> 
>  - Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests.
> 
>  - Updates to our linker script to move more data into read-only sections.
> 
>  - Allow the VDSO to be randomised on 32-bit.
> 
>  - Many other small features and fixes.

FYI, something in here broke the wireguard test suite, which runs the
iperf3 networking utility. The full log is here [1], but the relevant part
is: 

[+] NS1: iperf3 -Z -t 3 -c 192.168.241.2
Connecting to host 192.168.241.2, port 5201
iperf3: error - failed to read /dev/urandom: Bad address

I'll see if I can narrow it down a bit more and bisect. But just FYI, in
case you have an intuition.

Jason


[1] 
https://build.wireguard.com/linux/4de65c5830233e7a4adf2e679510089ec4e210c7/powerpc.log


Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag

2022-10-09 Thread pr-tracker-bot
The pull request you sent on Sun, 09 Oct 2022 22:01:39 +1100:

> https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
> tags/powerpc-6.1-1

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/4899a36f91a9f9b06878471096bd143e7253006d

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html