Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
On Thu, Oct 13, 2022 at 03:14:08PM +1000, Nicholas Piggin wrote: > > > > > > BUG: using smp_processor_id() in preemptible [] code: swapper/0/1 > > > caller is .__flush_tlb_pending+0x40/0xf0 > > > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.0.0-28380-gde492c83cae0-dirty > > > #4 > > > Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac > > > Call Trace: > > > [c44c3540] [c0f93ef0] .dump_stack_lvl+0x7c/0xc4 > > > (unreliable) > > > [c44c35d0] [c0fc9550] > > > .check_preemption_disabled+0x140/0x150 > > > [c44c3660] [c0073dd0] .__flush_tlb_pending+0x40/0xf0 > > > [c44c36f0] [c0334434] .__apply_to_page_range+0x764/0xa30 > > > [c44c3840] [c006cad0] .change_memory_attr+0xf0/0x160 > > > [c44c38d0] [c02a1d70] .bpf_prog_select_runtime+0x150/0x230 > > > [c44c3970] [c0d405d4] .bpf_prepare_filter+0x504/0x6f0 > > > [c44c3a30] [c0d4085c] .bpf_prog_create+0x9c/0x140 > > > [c44c3ac0] [c2051d9c] .ptp_classifier_init+0x44/0x78 > > > [c44c3b50] [c2050f3c] .sock_init+0xe0/0x100 > > > [c44c3bd0] [c0010bd4] .do_one_initcall+0xa4/0x438 > > > [c44c3cc0] [c2005008] .kernel_init_freeable+0x378/0x428 > > > [c44c3da0] [c00113d8] .kernel_init+0x28/0x1a0 > > > [c44c3e10] [c000ca3c] .ret_from_kernel_thread+0x58/0x60 > > > > > > This in turn is because __flush_tlb_pending() calls: > > > > > > static inline int mm_is_thread_local(struct mm_struct *mm) > > > { > > > return cpumask_equal(mm_cpumask(mm), > > >cpumask_of(smp_processor_id())); > > > } > > > > > > __flush_tlb_pending() has a comment about this: > > > > > > * Must be called from within some kind of spinlock/non-preempt region... > > > */ > > > void __flush_tlb_pending(struct ppc64_tlb_batch *batch) > > > > > > So I guess that didn't happen for some reason? Maybe this is indicative > > > of some lock imbalance that then gets hit later? > > > > I managed to bisect that problem. Unfortunately it points to the > > scheduler merge. No idea what to do about that. Any idea ? > > I am copying Peter and Ingo for comments. > > > > > # first bad commit: [30c37f69abf935b0228b8411713737377d9e] Merge tag > > 'sched-core-2022-10-07' of > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > > This might be a red herring because I can reproduce without it. > I think we can fix this with some preempt critical sections, they > don't look too much of a problem. > Do you refer to the bisect of the BUG: message above, or to the other problem ? I can try to repeat the bisect with some retries if you think that 30c37f69a isn't responsible for "BUG: using smp_processor_id() in preemptible [] code". Thanks, Guenter
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
On Thu Oct 13, 2022 at 10:21 AM AEST, Guenter Roeck wrote: > On Thu, Oct 13, 2022 at 11:03:34AM +1100, Michael Ellerman wrote: > > Guenter Roeck writes: > > > On Wed, Oct 12, 2022 at 11:20:38AM -0600, Jason A. Donenfeld wrote: > > >> > > >> I've also managed to not hit this bug a few times. When it triggers, > > >> after "kprobes: kprobe jump-optimization is enabled. All kprobes are > > >> optimized if possible.", there's a long hang - tens seconds before it > > >> continues. When it doesn't trigger, there's no hang at that point in the > > >> boot process. > > >> > > > > > > I managed to bisect the problem. See below for results. Reverting the > > > offending patch fixes the problem for me. > > > > Thanks. > > > > This is probably down to me/us not testing with PREEMPT enabled enough. > > > Not sure. My configuration has > > CONFIG_PREEMPT_NONE=y > # CONFIG_PREEMPT_VOLUNTARY is not set > # CONFIG_PREEMPT is not set Okay I reproduced it, just takes a while to hit. Thanks, Nick
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
On 10/12/22 22:03, Nicholas Piggin wrote: On Thu Oct 13, 2022 at 10:21 AM AEST, Guenter Roeck wrote: On Thu, Oct 13, 2022 at 11:03:34AM +1100, Michael Ellerman wrote: Guenter Roeck writes: On Wed, Oct 12, 2022 at 11:20:38AM -0600, Jason A. Donenfeld wrote: I've also managed to not hit this bug a few times. When it triggers, after "kprobes: kprobe jump-optimization is enabled. All kprobes are optimized if possible.", there's a long hang - tens seconds before it continues. When it doesn't trigger, there's no hang at that point in the boot process. I managed to bisect the problem. See below for results. Reverting the offending patch fixes the problem for me. Thanks. This is probably down to me/us not testing with PREEMPT enabled enough. Not sure. My configuration has CONFIG_PREEMPT_NONE=y # CONFIG_PREEMPT_VOLUNTARY is not set # CONFIG_PREEMPT is not set Thanks very much for helping with this. The config snippet you posted here https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-October/249758.html has CONFIG_PREEMPT=y. How do you turn that into a .config, olddefconfig? I can't reproduce this so far using your config and qemu command line, but the patch you've bisected it to definitely could cause this. I'll keep trying... Uuh, sorry, I think I got confused with running multiple bisects on the same branch, and took the above from a different bisect run. You are correct, PREEMPT is enabled in the configuration. Timing is definitely involved; I see the problem more often on a loaded system. To bisect it, I had to repeat the test for each bisect step several times (I set the limit to 20 retries; that gave me reliable results). Guenter
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
On Thu, Oct 13, 2022 at 03:03:14PM +1000, Nicholas Piggin wrote: > On Thu Oct 13, 2022 at 10:21 AM AEST, Guenter Roeck wrote: > > On Thu, Oct 13, 2022 at 11:03:34AM +1100, Michael Ellerman wrote: > > > Guenter Roeck writes: > > > > On Wed, Oct 12, 2022 at 11:20:38AM -0600, Jason A. Donenfeld wrote: > > > >> > > > >> I've also managed to not hit this bug a few times. When it triggers, > > > >> after "kprobes: kprobe jump-optimization is enabled. All kprobes are > > > >> optimized if possible.", there's a long hang - tens seconds before it > > > >> continues. When it doesn't trigger, there's no hang at that point in > > > >> the > > > >> boot process. > > > >> > > > > > > > > I managed to bisect the problem. See below for results. Reverting the > > > > offending patch fixes the problem for me. > > > > > > Thanks. > > > > > > This is probably down to me/us not testing with PREEMPT enabled enough. > > > > > Not sure. My configuration has > > > > CONFIG_PREEMPT_NONE=y > > # CONFIG_PREEMPT_VOLUNTARY is not set > > # CONFIG_PREEMPT is not set > > Thanks very much for helping with this. The config snippet you posted here > https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-October/249758.html > has CONFIG_PREEMPT=y. How do you turn that into a .config, olddefconfig? > > I can't reproduce this so far using your config and qemu command line, > but the patch you've bisected it to definitely could cause this. I'll > keep trying... Voila https://xn--4db.cc/dt00j0mt this repros it for me. > > Thanks, > Nick > > [...] > > > > # first bad commit: [e485f6c751e0a969327336c635ca602feea117f0] > > > > powerpc/64/interrupt: Fix return to masked context after hard-mask irq > > > > becomes pending >
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
On Thu Oct 13, 2022 at 4:37 AM AEST, Jason A. Donenfeld wrote: > On Wed, Oct 12, 2022 at 10:48:26AM -0700, Guenter Roeck wrote: > > > I've also managed to not hit this bug a few times. When it triggers, > > > after "kprobes: kprobe jump-optimization is enabled. All kprobes are > > > optimized if possible.", there's a long hang - tens seconds before it > > > continues. When it doesn't trigger, there's no hang at that point in the > > > boot process. > > > > > > > That probably explains why my attempts to bisect the problem were > > unsuccessful. > > So I just did this: > > diff --git a/drivers/char/random.c b/drivers/char/random.c > index 2fe28eeb2f38..2d70bc09db7e 100644 > --- a/drivers/char/random.c > +++ b/drivers/char/random.c > @@ -1212,6 +1212,7 @@ static void __cold try_to_generate_entropy(void) > struct entropy_timer_state stack; > unsigned int i, num_different = 0; > unsigned long last = random_get_entropy(); > + return; > > for (i = 0; i < NUM_TRIAL_SAMPLES - 1; ++i) { > stack.entropy = random_get_entropy(); > > And then ran it, and now we get the lockup from the idle process: Yep that rules out the random code. And really if it was calling schedule() it shouldn't be getting a softlockup anyway. Thanks, Nick
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
On Thu Oct 13, 2022 at 2:43 PM AEST, Guenter Roeck wrote: > On 10/12/22 10:20, Jason A. Donenfeld wrote: > > On Wed, Oct 12, 2022 at 09:44:52AM -0700, Guenter Roeck wrote: > >> On Wed, Oct 12, 2022 at 09:49:26AM -0600, Jason A. Donenfeld wrote: > >>> On Wed, Oct 12, 2022 at 07:18:27AM -0700, Guenter Roeck wrote: > NIP [c0031630] .replay_soft_interrupts+0x60/0x300 > LR [c0031964] .arch_local_irq_restore+0x94/0x1c0 > Call Trace: > [c7df3870] [c0031964] .arch_local_irq_restore+0x94/0x1c0 > (unreliable) > [c7df38f0] [c0f8a444] .__schedule+0x664/0xa50 > [c7df39d0] [c0f8a8b0] .schedule+0x80/0x140 > [c7df3a50] [c092f0dc] > .try_to_generate_entropy+0x118/0x174 > [c7df3b40] [c092e2e4] .urandom_read_iter+0x74/0x140 > [c7df3bc0] [c03b0044] .vfs_read+0x284/0x2d0 > [c7df3cd0] [c03b0d2c] .ksys_read+0xdc/0x130 > [c7df3d80] [c002a88c] .system_call_exception+0x19c/0x330 > [c7df3e10] [c000c1d4] system_call_common+0xf4/0x258 > >>> > >>> Obviously the first couple lines of this concern me a bit. But I think > >>> actually this might just be a catalyst for another bug. You could view > >>> that function as basically just: > >>> > >>> while (something) > >>> schedule(); > >>> > >>> And I guess in the process of calling the scheduler a lot, which toggles > >>> interrupts a lot, something got wedged. > >>> > >>> Curious, though, I did try to reproduce this, to no avail. My .config is > >>> https://xn--4db.cc/rBvHWfDZ . What's yours? > >>> > >> > >> Attached. My qemu command line is > > > > Okay, thanks, I reproduced it. In this case, I suspect > > try_to_generate_entropy() is just the messenger. There's an earlier > > problem: > > > > BUG: using smp_processor_id() in preemptible [] code: swapper/0/1 > > caller is .__flush_tlb_pending+0x40/0xf0 > > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.0.0-28380-gde492c83cae0-dirty #4 > > Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac > > Call Trace: > > [c44c3540] [c0f93ef0] .dump_stack_lvl+0x7c/0xc4 (unreliable) > > [c44c35d0] [c0fc9550] .check_preemption_disabled+0x140/0x150 > > [c44c3660] [c0073dd0] .__flush_tlb_pending+0x40/0xf0 > > [c44c36f0] [c0334434] .__apply_to_page_range+0x764/0xa30 > > [c44c3840] [c006cad0] .change_memory_attr+0xf0/0x160 > > [c44c38d0] [c02a1d70] .bpf_prog_select_runtime+0x150/0x230 > > [c44c3970] [c0d405d4] .bpf_prepare_filter+0x504/0x6f0 > > [c44c3a30] [c0d4085c] .bpf_prog_create+0x9c/0x140 > > [c44c3ac0] [c2051d9c] .ptp_classifier_init+0x44/0x78 > > [c44c3b50] [c2050f3c] .sock_init+0xe0/0x100 > > [c44c3bd0] [c0010bd4] .do_one_initcall+0xa4/0x438 > > [c44c3cc0] [c2005008] .kernel_init_freeable+0x378/0x428 > > [c44c3da0] [c00113d8] .kernel_init+0x28/0x1a0 > > [c44c3e10] [c000ca3c] .ret_from_kernel_thread+0x58/0x60 > > > > This in turn is because __flush_tlb_pending() calls: > > > > static inline int mm_is_thread_local(struct mm_struct *mm) > > { > > return cpumask_equal(mm_cpumask(mm), > >cpumask_of(smp_processor_id())); > > } > > > > __flush_tlb_pending() has a comment about this: > > > > * Must be called from within some kind of spinlock/non-preempt region... > > */ > > void __flush_tlb_pending(struct ppc64_tlb_batch *batch) > > > > So I guess that didn't happen for some reason? Maybe this is indicative > > of some lock imbalance that then gets hit later? > > I managed to bisect that problem. Unfortunately it points to the > scheduler merge. No idea what to do about that. Any idea ? > I am copying Peter and Ingo for comments. > > # first bad commit: [30c37f69abf935b0228b8411713737377d9e] Merge tag > 'sched-core-2022-10-07' of > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip This might be a red herring because I can reproduce without it. I think we can fix this with some preempt critical sections, they don't look too much of a problem. I don't know why it's not showing up earlier than this release, I'll look into it a bit more. Thanks, Nick
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
On Thu Oct 13, 2022 at 10:21 AM AEST, Guenter Roeck wrote: > On Thu, Oct 13, 2022 at 11:03:34AM +1100, Michael Ellerman wrote: > > Guenter Roeck writes: > > > On Wed, Oct 12, 2022 at 11:20:38AM -0600, Jason A. Donenfeld wrote: > > >> > > >> I've also managed to not hit this bug a few times. When it triggers, > > >> after "kprobes: kprobe jump-optimization is enabled. All kprobes are > > >> optimized if possible.", there's a long hang - tens seconds before it > > >> continues. When it doesn't trigger, there's no hang at that point in the > > >> boot process. > > >> > > > > > > I managed to bisect the problem. See below for results. Reverting the > > > offending patch fixes the problem for me. > > > > Thanks. > > > > This is probably down to me/us not testing with PREEMPT enabled enough. > > > Not sure. My configuration has > > CONFIG_PREEMPT_NONE=y > # CONFIG_PREEMPT_VOLUNTARY is not set > # CONFIG_PREEMPT is not set Thanks very much for helping with this. The config snippet you posted here https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-October/249758.html has CONFIG_PREEMPT=y. How do you turn that into a .config, olddefconfig? I can't reproduce this so far using your config and qemu command line, but the patch you've bisected it to definitely could cause this. I'll keep trying... Thanks, Nick [...] > > > # first bad commit: [e485f6c751e0a969327336c635ca602feea117f0] > > > powerpc/64/interrupt: Fix return to masked context after hard-mask irq > > > becomes pending
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
On 10/12/22 10:20, Jason A. Donenfeld wrote: On Wed, Oct 12, 2022 at 09:44:52AM -0700, Guenter Roeck wrote: On Wed, Oct 12, 2022 at 09:49:26AM -0600, Jason A. Donenfeld wrote: On Wed, Oct 12, 2022 at 07:18:27AM -0700, Guenter Roeck wrote: NIP [c0031630] .replay_soft_interrupts+0x60/0x300 LR [c0031964] .arch_local_irq_restore+0x94/0x1c0 Call Trace: [c7df3870] [c0031964] .arch_local_irq_restore+0x94/0x1c0 (unreliable) [c7df38f0] [c0f8a444] .__schedule+0x664/0xa50 [c7df39d0] [c0f8a8b0] .schedule+0x80/0x140 [c7df3a50] [c092f0dc] .try_to_generate_entropy+0x118/0x174 [c7df3b40] [c092e2e4] .urandom_read_iter+0x74/0x140 [c7df3bc0] [c03b0044] .vfs_read+0x284/0x2d0 [c7df3cd0] [c03b0d2c] .ksys_read+0xdc/0x130 [c7df3d80] [c002a88c] .system_call_exception+0x19c/0x330 [c7df3e10] [c000c1d4] system_call_common+0xf4/0x258 Obviously the first couple lines of this concern me a bit. But I think actually this might just be a catalyst for another bug. You could view that function as basically just: while (something) schedule(); And I guess in the process of calling the scheduler a lot, which toggles interrupts a lot, something got wedged. Curious, though, I did try to reproduce this, to no avail. My .config is https://xn--4db.cc/rBvHWfDZ . What's yours? Attached. My qemu command line is Okay, thanks, I reproduced it. In this case, I suspect try_to_generate_entropy() is just the messenger. There's an earlier problem: BUG: using smp_processor_id() in preemptible [] code: swapper/0/1 caller is .__flush_tlb_pending+0x40/0xf0 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.0.0-28380-gde492c83cae0-dirty #4 Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac Call Trace: [c44c3540] [c0f93ef0] .dump_stack_lvl+0x7c/0xc4 (unreliable) [c44c35d0] [c0fc9550] .check_preemption_disabled+0x140/0x150 [c44c3660] [c0073dd0] .__flush_tlb_pending+0x40/0xf0 [c44c36f0] [c0334434] .__apply_to_page_range+0x764/0xa30 [c44c3840] [c006cad0] .change_memory_attr+0xf0/0x160 [c44c38d0] [c02a1d70] .bpf_prog_select_runtime+0x150/0x230 [c44c3970] [c0d405d4] .bpf_prepare_filter+0x504/0x6f0 [c44c3a30] [c0d4085c] .bpf_prog_create+0x9c/0x140 [c44c3ac0] [c2051d9c] .ptp_classifier_init+0x44/0x78 [c44c3b50] [c2050f3c] .sock_init+0xe0/0x100 [c44c3bd0] [c0010bd4] .do_one_initcall+0xa4/0x438 [c44c3cc0] [c2005008] .kernel_init_freeable+0x378/0x428 [c44c3da0] [c00113d8] .kernel_init+0x28/0x1a0 [c44c3e10] [c000ca3c] .ret_from_kernel_thread+0x58/0x60 This in turn is because __flush_tlb_pending() calls: static inline int mm_is_thread_local(struct mm_struct *mm) { return cpumask_equal(mm_cpumask(mm), cpumask_of(smp_processor_id())); } __flush_tlb_pending() has a comment about this: * Must be called from within some kind of spinlock/non-preempt region... */ void __flush_tlb_pending(struct ppc64_tlb_batch *batch) So I guess that didn't happen for some reason? Maybe this is indicative of some lock imbalance that then gets hit later? I managed to bisect that problem. Unfortunately it points to the scheduler merge. No idea what to do about that. Any idea ? I am copying Peter and Ingo for comments. Guenter --- # bad: [1440f576022887004f719883acb094e7e0dd4944] Merge tag 'mm-hotfixes-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm # good: [4fe89d07dcc2804c8b562f6c7896a45643d34b2f] Linux 6.0 git bisect start 'HEAD' 'v6.0' # good: [7171a8da00035e7913c3013ca5fb5beb5b8b22f0] Merge tag 'arm-dt-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc git bisect good 7171a8da00035e7913c3013ca5fb5beb5b8b22f0 # good: [f01603979a4afaad7504a728918b678d572cda9e] Merge tag 'gpio-updates-for-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux git bisect good f01603979a4afaad7504a728918b678d572cda9e # bad: [8aeab132e05fefc3a1a5277878629586bd7a3547] Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost git bisect bad 8aeab132e05fefc3a1a5277878629586bd7a3547 # good: [493ffd6605b2d3d4dc7008ab927dba319f36671f] Merge tag 'ucount-rlimits-cleanups-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace git bisect good 493ffd6605b2d3d4dc7008ab927dba319f36671f # bad: [cdf072acb5baa18e5b05bdf3f13d6481f62396fc] Merge tag 'trace-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace git bisect bad cdf072acb5baa18e5b05bdf3f13d6481f62396fc # bad: [55be6084c8e0e0ada9278c2ab60b7a584378efda] Merge tag 'timers-core-2022-10-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect bad
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
On Thu, Oct 13, 2022 at 11:03:34AM +1100, Michael Ellerman wrote: > Guenter Roeck writes: > > On Wed, Oct 12, 2022 at 11:20:38AM -0600, Jason A. Donenfeld wrote: > >> > >> I've also managed to not hit this bug a few times. When it triggers, > >> after "kprobes: kprobe jump-optimization is enabled. All kprobes are > >> optimized if possible.", there's a long hang - tens seconds before it > >> continues. When it doesn't trigger, there's no hang at that point in the > >> boot process. > >> > > > > I managed to bisect the problem. See below for results. Reverting the > > offending patch fixes the problem for me. > > Thanks. > > This is probably down to me/us not testing with PREEMPT enabled enough. > Not sure. My configuration has CONFIG_PREEMPT_NONE=y # CONFIG_PREEMPT_VOLUNTARY is not set # CONFIG_PREEMPT is not set Guenter > cheers > > > --- > > # bad: [1440f576022887004f719883acb094e7e0dd4944] Merge tag > > 'mm-hotfixes-stable-2022-10-11' of > > git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm > > # good: [4fe89d07dcc2804c8b562f6c7896a45643d34b2f] Linux 6.0 > > git bisect start 'HEAD' 'v6.0' > > # good: [7171a8da00035e7913c3013ca5fb5beb5b8b22f0] Merge tag 'arm-dt-6.1' > > of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc > > git bisect good 7171a8da00035e7913c3013ca5fb5beb5b8b22f0 > > # good: [f01603979a4afaad7504a728918b678d572cda9e] Merge tag > > 'gpio-updates-for-v6.1-rc1' of > > git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux > > git bisect good f01603979a4afaad7504a728918b678d572cda9e > > # bad: [8aeab132e05fefc3a1a5277878629586bd7a3547] Merge tag 'for_linus' of > > git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost > > git bisect bad 8aeab132e05fefc3a1a5277878629586bd7a3547 > > # bad: [493ffd6605b2d3d4dc7008ab927dba319f36671f] Merge tag > > 'ucount-rlimits-cleanups-for-v5.19' of > > git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace > > git bisect bad 493ffd6605b2d3d4dc7008ab927dba319f36671f > > # good: [0e470763d84dcad27284067647dfb4b1a94dfce0] Merge tag > > 'efi-next-for-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi > > git bisect good 0e470763d84dcad27284067647dfb4b1a94dfce0 > > # bad: [110a58b9f91c66f743c01a2c217243d94c899c23] powerpc/boot: Explicitly > > disable usage of SPE instructions > > git bisect bad 110a58b9f91c66f743c01a2c217243d94c899c23 > > # good: [fdfdcfd504933ed06eb6b4c9df21eede0e213c3e] powerpc/build: put > > sys_call_table in .data.rel.ro if RELOCATABLE > > git bisect good fdfdcfd504933ed06eb6b4c9df21eede0e213c3e > > # good: [c2e7a19827eec443a7cbe85e8d959052412d6dc3] powerpc: Use generic > > fallocate compatibility syscall > > git bisect good c2e7a19827eec443a7cbe85e8d959052412d6dc3 > > # good: [56adbb7a8b6cc7fc9b940829c38494e53c9e57d1] powerpc/64/interrupt: > > Fix false warning in context tracking due to idle state > > git bisect good 56adbb7a8b6cc7fc9b940829c38494e53c9e57d1 > > # bad: [754f611774e4b9357a944f5b703dd291c85161cf] powerpc/64: switch asm > > helpers from GOT to TOC relative addressing > > git bisect bad 754f611774e4b9357a944f5b703dd291c85161cf > > # bad: [f7bff6e7759b1abb59334f6448f9ef3172c4c04a] powerpc/64/interrupt: > > avoid BUG/WARN recursion in interrupt entry > > git bisect bad f7bff6e7759b1abb59334f6448f9ef3172c4c04a > > # bad: [e485f6c751e0a969327336c635ca602feea117f0] powerpc/64/interrupt: Fix > > return to masked context after hard-mask irq becomes pending > > git bisect bad e485f6c751e0a969327336c635ca602feea117f0 > > # good: [799f7063c7645f9a751d17f5dfd73b952f962cd2] powerpc/64: mark irqs > > hard disabled in boot paca > > git bisect good 799f7063c7645f9a751d17f5dfd73b952f962cd2 > > # first bad commit: [e485f6c751e0a969327336c635ca602feea117f0] > > powerpc/64/interrupt: Fix return to masked context after hard-mask irq > > becomes pending
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
Guenter Roeck writes: > On Wed, Oct 12, 2022 at 11:20:38AM -0600, Jason A. Donenfeld wrote: >> >> I've also managed to not hit this bug a few times. When it triggers, >> after "kprobes: kprobe jump-optimization is enabled. All kprobes are >> optimized if possible.", there's a long hang - tens seconds before it >> continues. When it doesn't trigger, there's no hang at that point in the >> boot process. >> > > I managed to bisect the problem. See below for results. Reverting the > offending patch fixes the problem for me. Thanks. This is probably down to me/us not testing with PREEMPT enabled enough. cheers > --- > # bad: [1440f576022887004f719883acb094e7e0dd4944] Merge tag > 'mm-hotfixes-stable-2022-10-11' of > git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm > # good: [4fe89d07dcc2804c8b562f6c7896a45643d34b2f] Linux 6.0 > git bisect start 'HEAD' 'v6.0' > # good: [7171a8da00035e7913c3013ca5fb5beb5b8b22f0] Merge tag 'arm-dt-6.1' of > git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc > git bisect good 7171a8da00035e7913c3013ca5fb5beb5b8b22f0 > # good: [f01603979a4afaad7504a728918b678d572cda9e] Merge tag > 'gpio-updates-for-v6.1-rc1' of > git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux > git bisect good f01603979a4afaad7504a728918b678d572cda9e > # bad: [8aeab132e05fefc3a1a5277878629586bd7a3547] Merge tag 'for_linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost > git bisect bad 8aeab132e05fefc3a1a5277878629586bd7a3547 > # bad: [493ffd6605b2d3d4dc7008ab927dba319f36671f] Merge tag > 'ucount-rlimits-cleanups-for-v5.19' of > git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace > git bisect bad 493ffd6605b2d3d4dc7008ab927dba319f36671f > # good: [0e470763d84dcad27284067647dfb4b1a94dfce0] Merge tag > 'efi-next-for-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi > git bisect good 0e470763d84dcad27284067647dfb4b1a94dfce0 > # bad: [110a58b9f91c66f743c01a2c217243d94c899c23] powerpc/boot: Explicitly > disable usage of SPE instructions > git bisect bad 110a58b9f91c66f743c01a2c217243d94c899c23 > # good: [fdfdcfd504933ed06eb6b4c9df21eede0e213c3e] powerpc/build: put > sys_call_table in .data.rel.ro if RELOCATABLE > git bisect good fdfdcfd504933ed06eb6b4c9df21eede0e213c3e > # good: [c2e7a19827eec443a7cbe85e8d959052412d6dc3] powerpc: Use generic > fallocate compatibility syscall > git bisect good c2e7a19827eec443a7cbe85e8d959052412d6dc3 > # good: [56adbb7a8b6cc7fc9b940829c38494e53c9e57d1] powerpc/64/interrupt: Fix > false warning in context tracking due to idle state > git bisect good 56adbb7a8b6cc7fc9b940829c38494e53c9e57d1 > # bad: [754f611774e4b9357a944f5b703dd291c85161cf] powerpc/64: switch asm > helpers from GOT to TOC relative addressing > git bisect bad 754f611774e4b9357a944f5b703dd291c85161cf > # bad: [f7bff6e7759b1abb59334f6448f9ef3172c4c04a] powerpc/64/interrupt: avoid > BUG/WARN recursion in interrupt entry > git bisect bad f7bff6e7759b1abb59334f6448f9ef3172c4c04a > # bad: [e485f6c751e0a969327336c635ca602feea117f0] powerpc/64/interrupt: Fix > return to masked context after hard-mask irq becomes pending > git bisect bad e485f6c751e0a969327336c635ca602feea117f0 > # good: [799f7063c7645f9a751d17f5dfd73b952f962cd2] powerpc/64: mark irqs hard > disabled in boot paca > git bisect good 799f7063c7645f9a751d17f5dfd73b952f962cd2 > # first bad commit: [e485f6c751e0a969327336c635ca602feea117f0] > powerpc/64/interrupt: Fix return to masked context after hard-mask irq > becomes pending
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
On Wed, Oct 12, 2022 at 11:20:38AM -0600, Jason A. Donenfeld wrote: > > I've also managed to not hit this bug a few times. When it triggers, > after "kprobes: kprobe jump-optimization is enabled. All kprobes are > optimized if possible.", there's a long hang - tens seconds before it > continues. When it doesn't trigger, there's no hang at that point in the > boot process. > I managed to bisect the problem. See below for results. Reverting the offending patch fixes the problem for me. Guenter --- # bad: [1440f576022887004f719883acb094e7e0dd4944] Merge tag 'mm-hotfixes-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm # good: [4fe89d07dcc2804c8b562f6c7896a45643d34b2f] Linux 6.0 git bisect start 'HEAD' 'v6.0' # good: [7171a8da00035e7913c3013ca5fb5beb5b8b22f0] Merge tag 'arm-dt-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc git bisect good 7171a8da00035e7913c3013ca5fb5beb5b8b22f0 # good: [f01603979a4afaad7504a728918b678d572cda9e] Merge tag 'gpio-updates-for-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux git bisect good f01603979a4afaad7504a728918b678d572cda9e # bad: [8aeab132e05fefc3a1a5277878629586bd7a3547] Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost git bisect bad 8aeab132e05fefc3a1a5277878629586bd7a3547 # bad: [493ffd6605b2d3d4dc7008ab927dba319f36671f] Merge tag 'ucount-rlimits-cleanups-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace git bisect bad 493ffd6605b2d3d4dc7008ab927dba319f36671f # good: [0e470763d84dcad27284067647dfb4b1a94dfce0] Merge tag 'efi-next-for-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi git bisect good 0e470763d84dcad27284067647dfb4b1a94dfce0 # bad: [110a58b9f91c66f743c01a2c217243d94c899c23] powerpc/boot: Explicitly disable usage of SPE instructions git bisect bad 110a58b9f91c66f743c01a2c217243d94c899c23 # good: [fdfdcfd504933ed06eb6b4c9df21eede0e213c3e] powerpc/build: put sys_call_table in .data.rel.ro if RELOCATABLE git bisect good fdfdcfd504933ed06eb6b4c9df21eede0e213c3e # good: [c2e7a19827eec443a7cbe85e8d959052412d6dc3] powerpc: Use generic fallocate compatibility syscall git bisect good c2e7a19827eec443a7cbe85e8d959052412d6dc3 # good: [56adbb7a8b6cc7fc9b940829c38494e53c9e57d1] powerpc/64/interrupt: Fix false warning in context tracking due to idle state git bisect good 56adbb7a8b6cc7fc9b940829c38494e53c9e57d1 # bad: [754f611774e4b9357a944f5b703dd291c85161cf] powerpc/64: switch asm helpers from GOT to TOC relative addressing git bisect bad 754f611774e4b9357a944f5b703dd291c85161cf # bad: [f7bff6e7759b1abb59334f6448f9ef3172c4c04a] powerpc/64/interrupt: avoid BUG/WARN recursion in interrupt entry git bisect bad f7bff6e7759b1abb59334f6448f9ef3172c4c04a # bad: [e485f6c751e0a969327336c635ca602feea117f0] powerpc/64/interrupt: Fix return to masked context after hard-mask irq becomes pending git bisect bad e485f6c751e0a969327336c635ca602feea117f0 # good: [799f7063c7645f9a751d17f5dfd73b952f962cd2] powerpc/64: mark irqs hard disabled in boot paca git bisect good 799f7063c7645f9a751d17f5dfd73b952f962cd2 # first bad commit: [e485f6c751e0a969327336c635ca602feea117f0] powerpc/64/interrupt: Fix return to masked context after hard-mask irq becomes pending
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
On Wed, Oct 12, 2022 at 10:48:26AM -0700, Guenter Roeck wrote: > > I've also managed to not hit this bug a few times. When it triggers, > > after "kprobes: kprobe jump-optimization is enabled. All kprobes are > > optimized if possible.", there's a long hang - tens seconds before it > > continues. When it doesn't trigger, there's no hang at that point in the > > boot process. > > > > That probably explains why my attempts to bisect the problem were > unsuccessful. So I just did this: diff --git a/drivers/char/random.c b/drivers/char/random.c index 2fe28eeb2f38..2d70bc09db7e 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1212,6 +1212,7 @@ static void __cold try_to_generate_entropy(void) struct entropy_timer_state stack; unsigned int i, num_different = 0; unsigned long last = random_get_entropy(); + return; for (i = 0; i < NUM_TRIAL_SAMPLES - 1; ++i) { stack.entropy = random_get_entropy(); And then ran it, and now we get the lockup from the idle process: udhcpc: started, v1.33.0 udhcpc: sending discover watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:0] Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.0.0-28380-gde492c83cae0-dirty #10 Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac NIP: c00300f8 LR: c00304e8 CTR: c001a410 REGS: c28c79a8 TRAP: 0900 Not tainted (6.0.0-28380-gde492c83cae0-dirty) MSR: 8000b032 CR: 24088442 XER: IRQMASK: 0 GPR00: c00304e8 c28c7b30 c1435500 c28c79a8 GPR04: c13366c0 0010029c GPR08: c2d3bbb0 c2883d00 c2915500 GPR12: 44088442 c2e0 0007 02295698 GPR16: 039400e8 02295258 02295660 022953d0 GPR20: 02295b10 022b34d0 02295b38 03945500 GPR24: 03945500 0008 c2883d80 c2883d00 GPR28: c290d0c0 0001 c290d018 c290cc78 NIP [c00300f8] .replay_soft_interrupts+0x28/0x2d0 LR [c00304e8] .arch_local_irq_restore+0x148/0x1a0 Call Trace: [c28c7b30] [c00304e8] .arch_local_irq_restore+0x148/0x1a0 (unreliable) [c28c7bb0] [c001a388] .arch_cpu_idle+0xb8/0x140 [c28c7c30] [c0fd4940] .default_idle_call+0x80/0xc8 [c28c7ca0] [c0148480] .do_idle+0x150/0x1a0 [c28c7d50] [c0148748] .cpu_startup_entry+0x38/0x40 [c28c7dd0] [c00113a8] .rest_init+0x168/0x170 [c28c7e60] [c2004224] .arch_post_acpi_subsys_init+0x0/0x24 [c28c7ed0] [c2004ba8] .start_kernel+0x8d0/0x924 [c28c7f90] [c000d4ac] start_here_common+0x1c/0x20 Instruction dump: 6000 6000 7c0802a6 f8010010 f821fe01 6000 6000 38610078 e92d0af8 f92101f8 3920 4803a491 <6000> 3920 e9410180 f92101b0 Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 0 Comm: swapper/0 Tainted: G L 6.0.0-28380-gde492c83cae0-dirty #10 Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac Call Trace: [c28c74a0] [c0f93b90] .dump_stack_lvl+0x7c/0xc4 (unreliable) [c28c7530] [c00d2a58] .panic+0x180/0x438 [c28c75e0] [c0232424] .watchdog_timer_fn+0x3a4/0x410 [c28c76a0] [c01cb964] .__hrtimer_run_queues+0x1f4/0x590 [c28c77a0] [c01cc354] .hrtimer_interrupt+0x134/0x300 [c28c7860] [c0021cd4] .timer_interrupt+0x1c4/0x5d0 [c28c7930] [c00302f8] .replay_soft_interrupts+0x228/0x2d0 [c28c7b30] [c00304e8] .arch_local_irq_restore+0x148/0x1a0 [c28c7bb0] [c001a388] .arch_cpu_idle+0xb8/0x140 [c28c7c30] [c0fd4940] .default_idle_call+0x80/0xc8 [c28c7ca0] [c0148480] .do_idle+0x150/0x1a0 [c28c7d50] [c0148748] .cpu_startup_entry+0x38/0x40 [c28c7dd0] [c00113a8] .rest_init+0x168/0x170 [c28c7e60] [c2004224] .arch_post_acpi_subsys_init+0x0/0x24 [c28c7ed0] [c2004ba8] .start_kernel+0x8d0/0x924 [c28c7f90] [c0
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
On Wed, Oct 12, 2022 at 11:20:38AM -0600, Jason A. Donenfeld wrote: > On Wed, Oct 12, 2022 at 09:44:52AM -0700, Guenter Roeck wrote: > > On Wed, Oct 12, 2022 at 09:49:26AM -0600, Jason A. Donenfeld wrote: > > > On Wed, Oct 12, 2022 at 07:18:27AM -0700, Guenter Roeck wrote: > > > > NIP [c0031630] .replay_soft_interrupts+0x60/0x300 > > > > LR [c0031964] .arch_local_irq_restore+0x94/0x1c0 > > > > Call Trace: > > > > [c7df3870] [c0031964] > > > > .arch_local_irq_restore+0x94/0x1c0 (unreliable) > > > > [c7df38f0] [c0f8a444] .__schedule+0x664/0xa50 > > > > [c7df39d0] [c0f8a8b0] .schedule+0x80/0x140 > > > > [c7df3a50] [c092f0dc] > > > > .try_to_generate_entropy+0x118/0x174 > > > > [c7df3b40] [c092e2e4] .urandom_read_iter+0x74/0x140 > > > > [c7df3bc0] [c03b0044] .vfs_read+0x284/0x2d0 > > > > [c7df3cd0] [c03b0d2c] .ksys_read+0xdc/0x130 > > > > [c7df3d80] [c002a88c] .system_call_exception+0x19c/0x330 > > > > [c7df3e10] [c000c1d4] system_call_common+0xf4/0x258 > > > > > > Obviously the first couple lines of this concern me a bit. But I think > > > actually this might just be a catalyst for another bug. You could view > > > that function as basically just: > > > > > > while (something) > > > schedule(); > > > > > > And I guess in the process of calling the scheduler a lot, which toggles > > > interrupts a lot, something got wedged. > > > > > > Curious, though, I did try to reproduce this, to no avail. My .config is > > > https://xn--4db.cc/rBvHWfDZ . What's yours? > > > > > > > Attached. My qemu command line is > > Okay, thanks, I reproduced it. In this case, I suspect > try_to_generate_entropy() is just the messenger. There's an earlier > problem: > That problem is not new but has existed for a couple of releases, and has never caused a hang until now. > BUG: using smp_processor_id() in preemptible [] code: swapper/0/1 > caller is .__flush_tlb_pending+0x40/0xf0 > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.0.0-28380-gde492c83cae0-dirty #4 > Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac > Call Trace: > [c44c3540] [c0f93ef0] .dump_stack_lvl+0x7c/0xc4 (unreliable) > [c44c35d0] [c0fc9550] .check_preemption_disabled+0x140/0x150 > [c44c3660] [c0073dd0] .__flush_tlb_pending+0x40/0xf0 > [c44c36f0] [c0334434] .__apply_to_page_range+0x764/0xa30 > [c44c3840] [c006cad0] .change_memory_attr+0xf0/0x160 > [c44c38d0] [c02a1d70] .bpf_prog_select_runtime+0x150/0x230 > [c44c3970] [c0d405d4] .bpf_prepare_filter+0x504/0x6f0 > [c44c3a30] [c0d4085c] .bpf_prog_create+0x9c/0x140 > [c44c3ac0] [c2051d9c] .ptp_classifier_init+0x44/0x78 > [c44c3b50] [c2050f3c] .sock_init+0xe0/0x100 > [c44c3bd0] [c0010bd4] .do_one_initcall+0xa4/0x438 > [c44c3cc0] [c2005008] .kernel_init_freeable+0x378/0x428 > [c44c3da0] [c00113d8] .kernel_init+0x28/0x1a0 > [c44c3e10] [c000ca3c] .ret_from_kernel_thread+0x58/0x60 > > This in turn is because __flush_tlb_pending() calls: > > static inline int mm_is_thread_local(struct mm_struct *mm) > { > return cpumask_equal(mm_cpumask(mm), > cpumask_of(smp_processor_id())); > } > > __flush_tlb_pending() has a comment about this: > > * Must be called from within some kind of spinlock/non-preempt region... > */ > void __flush_tlb_pending(struct ppc64_tlb_batch *batch) > > So I guess that didn't happen for some reason? Maybe this is indicative > of some lock imbalance that then gets hit later? > > I've also managed to not hit this bug a few times. When it triggers, > after "kprobes: kprobe jump-optimization is enabled. All kprobes are > optimized if possible.", there's a long hang - tens seconds before it > continues. When it doesn't trigger, there's no hang at that point in the > boot process. > That probably explains why my attempts to bisect the problem were unsuccessful. Thanks, Guenter
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
On Wed, Oct 12, 2022 at 09:44:52AM -0700, Guenter Roeck wrote: > On Wed, Oct 12, 2022 at 09:49:26AM -0600, Jason A. Donenfeld wrote: > > On Wed, Oct 12, 2022 at 07:18:27AM -0700, Guenter Roeck wrote: > > > NIP [c0031630] .replay_soft_interrupts+0x60/0x300 > > > LR [c0031964] .arch_local_irq_restore+0x94/0x1c0 > > > Call Trace: > > > [c7df3870] [c0031964] .arch_local_irq_restore+0x94/0x1c0 > > > (unreliable) > > > [c7df38f0] [c0f8a444] .__schedule+0x664/0xa50 > > > [c7df39d0] [c0f8a8b0] .schedule+0x80/0x140 > > > [c7df3a50] [c092f0dc] .try_to_generate_entropy+0x118/0x174 > > > [c7df3b40] [c092e2e4] .urandom_read_iter+0x74/0x140 > > > [c7df3bc0] [c03b0044] .vfs_read+0x284/0x2d0 > > > [c7df3cd0] [c03b0d2c] .ksys_read+0xdc/0x130 > > > [c7df3d80] [c002a88c] .system_call_exception+0x19c/0x330 > > > [c7df3e10] [c000c1d4] system_call_common+0xf4/0x258 > > > > Obviously the first couple lines of this concern me a bit. But I think > > actually this might just be a catalyst for another bug. You could view > > that function as basically just: > > > > while (something) > > schedule(); > > > > And I guess in the process of calling the scheduler a lot, which toggles > > interrupts a lot, something got wedged. > > > > Curious, though, I did try to reproduce this, to no avail. My .config is > > https://xn--4db.cc/rBvHWfDZ . What's yours? > > > > Attached. My qemu command line is Okay, thanks, I reproduced it. In this case, I suspect try_to_generate_entropy() is just the messenger. There's an earlier problem: BUG: using smp_processor_id() in preemptible [] code: swapper/0/1 caller is .__flush_tlb_pending+0x40/0xf0 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.0.0-28380-gde492c83cae0-dirty #4 Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac Call Trace: [c44c3540] [c0f93ef0] .dump_stack_lvl+0x7c/0xc4 (unreliable) [c44c35d0] [c0fc9550] .check_preemption_disabled+0x140/0x150 [c44c3660] [c0073dd0] .__flush_tlb_pending+0x40/0xf0 [c44c36f0] [c0334434] .__apply_to_page_range+0x764/0xa30 [c44c3840] [c006cad0] .change_memory_attr+0xf0/0x160 [c44c38d0] [c02a1d70] .bpf_prog_select_runtime+0x150/0x230 [c44c3970] [c0d405d4] .bpf_prepare_filter+0x504/0x6f0 [c44c3a30] [c0d4085c] .bpf_prog_create+0x9c/0x140 [c44c3ac0] [c2051d9c] .ptp_classifier_init+0x44/0x78 [c44c3b50] [c2050f3c] .sock_init+0xe0/0x100 [c44c3bd0] [c0010bd4] .do_one_initcall+0xa4/0x438 [c44c3cc0] [c2005008] .kernel_init_freeable+0x378/0x428 [c44c3da0] [c00113d8] .kernel_init+0x28/0x1a0 [c44c3e10] [c000ca3c] .ret_from_kernel_thread+0x58/0x60 This in turn is because __flush_tlb_pending() calls: static inline int mm_is_thread_local(struct mm_struct *mm) { return cpumask_equal(mm_cpumask(mm), cpumask_of(smp_processor_id())); } __flush_tlb_pending() has a comment about this: * Must be called from within some kind of spinlock/non-preempt region... */ void __flush_tlb_pending(struct ppc64_tlb_batch *batch) So I guess that didn't happen for some reason? Maybe this is indicative of some lock imbalance that then gets hit later? I've also managed to not hit this bug a few times. When it triggers, after "kprobes: kprobe jump-optimization is enabled. All kprobes are optimized if possible.", there's a long hang - tens seconds before it continues. When it doesn't trigger, there's no hang at that point in the boot process. Jason
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
On Wed, Oct 12, 2022 at 10:45:46AM -0600, Jason A. Donenfeld wrote: > On Wed, Oct 12, 2022 at 09:49:26AM -0600, Jason A. Donenfeld wrote: > > On Wed, Oct 12, 2022 at 07:18:27AM -0700, Guenter Roeck wrote: > > > NIP [c0031630] .replay_soft_interrupts+0x60/0x300 > > > LR [c0031964] .arch_local_irq_restore+0x94/0x1c0 > > > Call Trace: > > > [c7df3870] [c0031964] .arch_local_irq_restore+0x94/0x1c0 > > > (unreliable) > > > [c7df38f0] [c0f8a444] .__schedule+0x664/0xa50 > > > [c7df39d0] [c0f8a8b0] .schedule+0x80/0x140 > > > [c7df3a50] [c092f0dc] .try_to_generate_entropy+0x118/0x174 > > > [c7df3b40] [c092e2e4] .urandom_read_iter+0x74/0x140 > > > [c7df3bc0] [c03b0044] .vfs_read+0x284/0x2d0 > > > [c7df3cd0] [c03b0d2c] .ksys_read+0xdc/0x130 > > > [c7df3d80] [c002a88c] .system_call_exception+0x19c/0x330 > > > [c7df3e10] [c000c1d4] system_call_common+0xf4/0x258 > > > > Obviously the first couple lines of this concern me a bit. But I think > > actually this might just be a catalyst for another bug. You could view > > that function as basically just: > > > > while (something) > > schedule(); > > > > And I guess in the process of calling the scheduler a lot, which toggles > > interrupts a lot, something got wedged. > > > > Curious, though, I did try to reproduce this, to no avail. My .config is > > https://xn--4db.cc/rBvHWfDZ . What's yours? > > I also just tried using your github linux-build-test scripts as a guide > for construction a config -- https://xn--4db.cc/B0HpEQDQ -- and loaded > up your rootfs over sdhci and such, and still couldn't manage to > reproduce. I tried commenting out the line "if (!bits)" in > _credit_init_bits(), so that the rng would never initialize, so that the > schedule() loop would just keep on running indefinitely, but still no > dice. > > But also, I'm running Linus' tree. From your log, I see > "6.0.0-rc2-00163-ga5edf9815dd7". So maybe these bugs got fixed > elsewhere? > Blame me for not attaching the latest crash report. Guenter --- BUG: soft lockup - CPU#0 stuck for 23s! [dd:111] Modules linked in: CPU: 0 PID: 111 Comm: dd Not tainted 6.0.0-11414-g49da07006239 #1 Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac NIP: c0031630 LR: c0031964 CTR: REGS: c7d5b6a8 TRAP: 0900 Not tainted (6.0.0-11414-g49da07006239) MSR: 80009032 CR: 28002228 XER: IRQMASK: 0 GPR00: c0031964 c7d5b870 c13e5500 c7d5b6a8 GPR04: c125e1c0 c7d5b814 c291d018 GPR08: c2d4bbb8 c7356400 c2d21098 GPR12: 2800 c2e2 100d32e0 100d32b4 GPR16: 100d3301 100d32b9 100d3358 100d32bf GPR20: 2000 100d3372 100d331e c7356c18 GPR24: 0e60 0900 0500 GPR28: 0a00 0f00 0002 0003 NIP [c0031630] .replay_soft_interrupts+0x60/0x300 LR [c0031964] .arch_local_irq_restore+0x94/0x1c0 Call Trace: [c7d5b870] [c0031964] .arch_local_irq_restore+0x94/0x1c0 (unreliable) [c7d5b8f0] [c0f8bac4] .__schedule+0x664/0xa50 [c7d5b9d0] [c0f8bf30] .schedule+0x80/0x140 [c7d5ba50] [c093085c] .try_to_generate_entropy+0x118/0x174 [c7d5bb40] [c092fa64] .urandom_read_iter+0x74/0x140 [c7d5bbc0] [c03b0044] .vfs_read+0x284/0x2d0 [c7d5bcd0] [c03b0d2c] .ksys_read+0xdc/0x130 [c7d5bd80] [c002a88c] .system_call_exception+0x19c/0x330 [c7d5be10] [c000c1d4] system_call_common+0xf4/0x258 --- interrupt: c00 at 0x7fffb5c9d49c NIP: 7fffb5c9d49c LR: 1000da90 CTR: REGS: c7d5be80 TRAP: 0c00 Not tainted (6.0.0-11414-g49da07006239) MSR: 8000f032 CR: 22002422 XER: IRQMASK: 0 GPR00: 0003 76dcc220 7fffb5d97300 GPR04: 101102a0 0020 GPR08: GPR12: 7fffb5e6aac0 100d32e0 100d32b4 GPR16: 100d3301 100d32b9 100d3358 100d32bf GPR20: 2000 100d3372 100d331e GPR24: 7fff 100b3a9c 101102a0 0020 GPR28: 101025c0 0020 NIP [7fffb5c9d49c] 0x7fffb5c9d49c LR [1000da90] 0x1000da90 --- interrupt: c00 Instruction dump: 3b600500 3b800a00 3ba00f00 f8010010 f821fdc1 6000 6000 38610078 e92d0af8 f92101f8 3920 48039745 <6000> 3900 e9410180
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
On Wed, Oct 12, 2022 at 09:49:26AM -0600, Jason A. Donenfeld wrote: > On Wed, Oct 12, 2022 at 07:18:27AM -0700, Guenter Roeck wrote: > > NIP [c0031630] .replay_soft_interrupts+0x60/0x300 > > LR [c0031964] .arch_local_irq_restore+0x94/0x1c0 > > Call Trace: > > [c7df3870] [c0031964] .arch_local_irq_restore+0x94/0x1c0 > > (unreliable) > > [c7df38f0] [c0f8a444] .__schedule+0x664/0xa50 > > [c7df39d0] [c0f8a8b0] .schedule+0x80/0x140 > > [c7df3a50] [c092f0dc] .try_to_generate_entropy+0x118/0x174 > > [c7df3b40] [c092e2e4] .urandom_read_iter+0x74/0x140 > > [c7df3bc0] [c03b0044] .vfs_read+0x284/0x2d0 > > [c7df3cd0] [c03b0d2c] .ksys_read+0xdc/0x130 > > [c7df3d80] [c002a88c] .system_call_exception+0x19c/0x330 > > [c7df3e10] [c000c1d4] system_call_common+0xf4/0x258 > > Obviously the first couple lines of this concern me a bit. But I think > actually this might just be a catalyst for another bug. You could view > that function as basically just: > > while (something) > schedule(); > > And I guess in the process of calling the scheduler a lot, which toggles > interrupts a lot, something got wedged. > > Curious, though, I did try to reproduce this, to no avail. My .config is > https://xn--4db.cc/rBvHWfDZ . What's yours? I also just tried using your github linux-build-test scripts as a guide for construction a config -- https://xn--4db.cc/B0HpEQDQ -- and loaded up your rootfs over sdhci and such, and still couldn't manage to reproduce. I tried commenting out the line "if (!bits)" in _credit_init_bits(), so that the rng would never initialize, so that the schedule() loop would just keep on running indefinitely, but still no dice. But also, I'm running Linus' tree. From your log, I see "6.0.0-rc2-00163-ga5edf9815dd7". So maybe these bugs got fixed elsewhere? Jason
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
On Wed, Oct 12, 2022 at 09:49:26AM -0600, Jason A. Donenfeld wrote: > On Wed, Oct 12, 2022 at 07:18:27AM -0700, Guenter Roeck wrote: > > NIP [c0031630] .replay_soft_interrupts+0x60/0x300 > > LR [c0031964] .arch_local_irq_restore+0x94/0x1c0 > > Call Trace: > > [c7df3870] [c0031964] .arch_local_irq_restore+0x94/0x1c0 > > (unreliable) > > [c7df38f0] [c0f8a444] .__schedule+0x664/0xa50 > > [c7df39d0] [c0f8a8b0] .schedule+0x80/0x140 > > [c7df3a50] [c092f0dc] .try_to_generate_entropy+0x118/0x174 > > [c7df3b40] [c092e2e4] .urandom_read_iter+0x74/0x140 > > [c7df3bc0] [c03b0044] .vfs_read+0x284/0x2d0 > > [c7df3cd0] [c03b0d2c] .ksys_read+0xdc/0x130 > > [c7df3d80] [c002a88c] .system_call_exception+0x19c/0x330 > > [c7df3e10] [c000c1d4] system_call_common+0xf4/0x258 > > Obviously the first couple lines of this concern me a bit. But I think > actually this might just be a catalyst for another bug. You could view > that function as basically just: > > while (something) > schedule(); > > And I guess in the process of calling the scheduler a lot, which toggles > interrupts a lot, something got wedged. > > Curious, though, I did try to reproduce this, to no avail. My .config is > https://xn--4db.cc/rBvHWfDZ . What's yours? > Attached. My qemu command line is qemu-system-ppc64 -M mac99 -cpu ppc64 \ -m 1G -kernel vmlinux -snapshot -device e1000,netdev=net0 \ -netdev user,id=net0 -device sdhci-pci -device sd-card,drive=d0 \ -drive file=/var/cache/buildbot/ppc64/rootfs.ext2,format=raw,if=none,id=d0 \ -nographic -vga none -monitor null -no-reboot \ --append "root=/dev/mmcblk0 rootwait console=tty console=ttyS0" Qemu version is 7.0. The root file system is from https://github.com/groeck/linux-build-test/tree/master/rootfs/ppc64 I used to have self tests enabled, but with that (specifically, with CONFIG_STRING_SELFTEST=y) I now get a different hang, so I disabled that for the time being. Guenter CONFIG_SYSVIPC=y CONFIG_POSIX_MQUEUE=y CONFIG_HIGH_RES_TIMERS=y CONFIG_PREEMPT=y CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y CONFIG_BSD_PROCESS_ACCT=y CONFIG_BSD_PROCESS_ACCT_V3=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_CGROUPS=y CONFIG_MEMCG=y CONFIG_BLK_CGROUP=y CONFIG_CGROUP_SCHED=y CONFIG_RT_GROUP_SCHED=y CONFIG_CGROUP_FREEZER=y CONFIG_CPUSETS=y CONFIG_CGROUP_DEVICE=y CONFIG_CGROUP_CPUACCT=y CONFIG_CGROUP_DEBUG=y CONFIG_NAMESPACES=y CONFIG_BLK_DEV_INITRD=y CONFIG_EMBEDDED=y CONFIG_PROFILING=y CONFIG_PPC64=y # CONFIG_PPC_POWERNV is not set CONFIG_DTL=y # CONFIG_CPU_IDLE is not set CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_BINFMT_MISC=m CONFIG_NET=y CONFIG_PACKET=y CONFIG_UNIX=y CONFIG_XFRM_USER=m CONFIG_XFRM_SUB_POLICY=y CONFIG_NET_KEY=m CONFIG_NET_KEY_MIGRATE=y CONFIG_INET=y CONFIG_IP_MULTICAST=y CONFIG_IP_ADVANCED_ROUTER=y CONFIG_IP_MULTIPLE_TABLES=y CONFIG_IP_ROUTE_MULTIPATH=y CONFIG_IP_ROUTE_VERBOSE=y CONFIG_IP_PNP=y CONFIG_IP_PNP_BOOTP=y CONFIG_IP_PNP_RARP=y CONFIG_NET_IPIP=m CONFIG_IP_MROUTE=y CONFIG_IP_PIMSM_V1=y CONFIG_IP_PIMSM_V2=y CONFIG_SYN_COOKIES=y CONFIG_INET_AH=m CONFIG_INET_ESP=m CONFIG_INET_IPCOMP=m CONFIG_IPV6_ROUTER_PREF=y CONFIG_INET6_AH=y CONFIG_INET6_ESP=y CONFIG_INET6_IPCOMP=m CONFIG_IPV6_TUNNEL=m CONFIG_NETFILTER=y CONFIG_NF_CONNTRACK=m CONFIG_NF_CONNTRACK_AMANDA=m CONFIG_NF_CONNTRACK_FTP=m CONFIG_NF_CONNTRACK_H323=m CONFIG_NF_CONNTRACK_IRC=m CONFIG_NF_CONNTRACK_NETBIOS_NS=m CONFIG_NF_CONNTRACK_PPTP=m CONFIG_NF_CONNTRACK_SANE=m CONFIG_NF_CONNTRACK_SIP=m CONFIG_NF_CONNTRACK_TFTP=m CONFIG_NF_CT_NETLINK=m CONFIG_NETFILTER_XT_TARGET_CLASSIFY=m CONFIG_NETFILTER_XT_TARGET_CONNMARK=m CONFIG_NETFILTER_XT_TARGET_DSCP=m CONFIG_NETFILTER_XT_TARGET_MARK=m CONFIG_NETFILTER_XT_TARGET_NFLOG=m CONFIG_NETFILTER_XT_TARGET_NFQUEUE=m CONFIG_NETFILTER_XT_TARGET_NOTRACK=m CONFIG_NETFILTER_XT_TARGET_TRACE=m CONFIG_NETFILTER_XT_TARGET_TCPMSS=m CONFIG_NETFILTER_XT_MATCH_COMMENT=m CONFIG_NETFILTER_XT_MATCH_CONNBYTES=m CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=m CONFIG_NETFILTER_XT_MATCH_CONNMARK=m CONFIG_NETFILTER_XT_MATCH_CONNTRACK=m CONFIG_NETFILTER_XT_MATCH_DCCP=m CONFIG_NETFILTER_XT_MATCH_DSCP=m CONFIG_NETFILTER_XT_MATCH_ESP=m CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=m CONFIG_NETFILTER_XT_MATCH_HELPER=m CONFIG_NETFILTER_XT_MATCH_LENGTH=m CONFIG_NETFILTER_XT_MATCH_LIMIT=m CONFIG_NETFILTER_XT_MATCH_MAC=m CONFIG_NETFILTER_XT_MATCH_MARK=m CONFIG_NETFILTER_XT_MATCH_MULTIPORT=m CONFIG_NETFILTER_XT_MATCH_POLICY=m CONFIG_NETFILTER_XT_MATCH_PKTTYPE=m CONFIG_NETFILTER_XT_MATCH_QUOTA=m CONFIG_NETFILTER_XT_MATCH_REALM=m CONFIG_NETFILTER_XT_MATCH_STATE=m CONFIG_NETFILTER_XT_MATCH_STATISTIC=m CONFIG_NETFILTER_XT_MATCH_STRING=m CONFIG_NETFILTER_XT_MATCH_TCPMSS=m CONFIG_NETFILTER_XT_MATCH_U32=m CONFIG_IP_NF_IPTABLES=m CONFIG_IP_NF_MATCH_AH=m CONFIG_IP_NF_MATCH_ECN=m CONFIG_IP_NF_MATCH_TTL=m CONFIG_IP_NF_FILTER=m CONFIG_IP_NF_TARGET_REJECT=m
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
On Wed, Oct 12, 2022 at 07:18:27AM -0700, Guenter Roeck wrote: > NIP [c0031630] .replay_soft_interrupts+0x60/0x300 > LR [c0031964] .arch_local_irq_restore+0x94/0x1c0 > Call Trace: > [c7df3870] [c0031964] .arch_local_irq_restore+0x94/0x1c0 > (unreliable) > [c7df38f0] [c0f8a444] .__schedule+0x664/0xa50 > [c7df39d0] [c0f8a8b0] .schedule+0x80/0x140 > [c7df3a50] [c092f0dc] .try_to_generate_entropy+0x118/0x174 > [c7df3b40] [c092e2e4] .urandom_read_iter+0x74/0x140 > [c7df3bc0] [c03b0044] .vfs_read+0x284/0x2d0 > [c7df3cd0] [c03b0d2c] .ksys_read+0xdc/0x130 > [c7df3d80] [c002a88c] .system_call_exception+0x19c/0x330 > [c7df3e10] [c000c1d4] system_call_common+0xf4/0x258 Obviously the first couple lines of this concern me a bit. But I think actually this might just be a catalyst for another bug. You could view that function as basically just: while (something) schedule(); And I guess in the process of calling the scheduler a lot, which toggles interrupts a lot, something got wedged. Curious, though, I did try to reproduce this, to no avail. My .config is https://xn--4db.cc/rBvHWfDZ . What's yours? Jason
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
On Sun, Oct 09, 2022 at 10:01:39PM +1100, Michael Ellerman wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > Hi Linus, > > Please pull powerpc updates for 6.1. > > No conflicts with your tree. There will be a conflict when you merge the > kbuild tree, due > to us renaming head_fsl_booke.S to head_85xx.S. The resolution is mostly > trivial, > linux-next has the correct result if it's unclear. > Post-merge problems are much more exciting when trying to run mac99 emulations in qemu. Enabling KFENCE results in log messages such as WARNING: inconsistent lock state 6.0.0-rc2-00163-ga5edf9815dd7 #1 Tainted: G N inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. swapper/0/1 [HC0[0]:SC0[0]:HE1:SE1] takes: c2734d68 (native_tlbie_lock){+.?.}-{2:2}, at: .native_hpte_updateboltedpp+0x1a4/0x600 {IN-SOFTIRQ-W} state was registered at: .lock_acquire+0x20c/0x520 ._raw_spin_lock+0x4c/0x70 .native_hpte_invalidate+0x62c/0x840 .hash__kernel_map_pages+0x450/0x640 .kfence_protect+0x58/0xc0 .kfence_guarded_free+0x374/0x5a0 .__slab_free+0x340/0x670 .__d_free+0x2c/0x50 .rcu_core+0x3f4/0x1750 .__do_softirq+0x1dc/0x7dc .do_softirq_own_stack+0x40/0x60 0xc775bca0 .irq_exit+0x1e8/0x220 .timer_interrupt+0x284/0x700 decrementer_common_virt+0x208/0x210 irq event stamp: 243607 hardirqs last enabled at (243607): [] .__slab_free+0x324/0x670 hardirqs last disabled at (243606): [] .__slab_free+0x1f4/0x670 softirqs last enabled at (242982): [] .__do_softirq+0x7ac/0x7dc softirqs last disabled at (242973): [] .do_softirq_own_stack+0x40/0x60 other info that might help us debug this: Possible unsafe locking scenario: CPU0 lock(native_tlbie_lock); lock(native_tlbie_lock); *** DEADLOCK *** and, indeed, there appear to be various deadlocks. I had to disable KFENCE to be able to test further (or maybe KFENCE works and points out the soft lockup problem observed below - hard for me to determine). > powerpc/pseries: Move dtl scanning and steal time accounting to pseries > platform With this patch, CONFIG_DTL must be enabled if CONFIG_PPC_SPLPAR is enabled. CONFIG_PPC_SPLPAR=y and CONFIG_DTL=n results in build failures due to irq.c:(.text+0x2798): undefined reference to `.pseries_accumulate_stolen_time' and many similar errors. I had to enable CONFIG_DTL explicitly to be able to build my test images. CONFIG_PPC_SPLPAR now depends on or requires CONFIG_DTL which in turn depends on CONFIG_DEBUG_FS. That seems odd. With all this worked around, I still get soft lockup problems when trying to boot from SDHCI. I have not been able to bisect this problem. BUG: soft lockup - CPU#0 stuck for 23s! [dd:111] Modules linked in: CPU: 0 PID: 111 Comm: dd Not tainted 6.0.0-10822-g60bb8154d1d7 #1 Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac NIP: c0031630 LR: c0031964 CTR: REGS: c7df36a8 TRAP: 0900 Not tainted (6.0.0-10822-g60bb8154d1d7) MSR: 8000b032 CR: 28002228 XER: IRQMASK: 0 GPR00: c0031964 c7df3870 c13e5500 c7df36a8 GPR04: c125dd80 c7df3814 c291d018 GPR08: c2d4bbb8 c7365100 c2d21098 GPR12: 2800 c2e2 100d32e0 100d32b4 GPR16: 100d3301 100d32b9 100d3358 100d32bf GPR20: 2000 100d3372 100d331e c7365918 GPR24: 0e60 0900 0500 GPR28: 0a00 0f00 0002 0003 NIP [c0031630] .replay_soft_interrupts+0x60/0x300 LR [c0031964] .arch_local_irq_restore+0x94/0x1c0 Call Trace: [c7df3870] [c0031964] .arch_local_irq_restore+0x94/0x1c0 (unreliable) [c7df38f0] [c0f8a444] .__schedule+0x664/0xa50 [c7df39d0] [c0f8a8b0] .schedule+0x80/0x140 [c7df3a50] [c092f0dc] .try_to_generate_entropy+0x118/0x174 [c7df3b40] [c092e2e4] .urandom_read_iter+0x74/0x140 [c7df3bc0] [c03b0044] .vfs_read+0x284/0x2d0 [c7df3cd0] [c03b0d2c] .ksys_read+0xdc/0x130 [c7df3d80] [c002a88c] .system_call_exception+0x19c/0x330 [c7df3e10] [c000c1d4] system_call_common+0xf4/0x258 --- interrupt: c00 at 0x7fff829fd49c NIP: 7fff829fd49c LR: 1000da90 CTR: REGS: c7df3e80 TRAP: 0c00 Not tainted (6.0.0-10822-g60bb8154d1d7) MSR: 8000f032 CR: 22002422 XER: IRQMASK: 0 GPR00: 0003 7138df70 7fff82af7300 GPR04: 101102a0 0020 GPR08: GPR12: 7fff82bcaac0
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
On Tue Oct 11, 2022 at 7:35 PM AEST, Michael Ellerman wrote: > "Jason A. Donenfeld" writes: > > On Tue, Oct 11, 2022 at 12:53:17PM +1100, Michael Ellerman wrote: > >> "Jason A. Donenfeld" writes: > >> > On Mon, Oct 10, 2022 at 01:25:25PM -0600, Jason A. Donenfeld wrote: > >> >> Hi Michael, > >> >> > >> >> On Sun, Oct 09, 2022 at 10:01:39PM +1100, Michael Ellerman wrote: > >> >> > powerpc updates for 6.1 > >> >> > > >> >> > - Remove our now never-true definitions for pgd_huge() and > >> >> > p4d_leaf(). > >> >> > > >> >> > - Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit. > >> >> > > >> >> > - Add support for syscall wrappers. > >> >> > > >> >> > - Add support for KFENCE on 64-bit. > >> >> > > >> >> > - Update 64-bit HV KVM to use the new guest state entry/exit > >> >> > accounting API. > >> >> > > >> >> > - Support execute-only memory when using the Radix MMU (P9 or later). > >> >> > > >> >> > - Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests. > >> >> > > >> >> > - Updates to our linker script to move more data into read-only > >> >> > sections. > >> >> > > >> >> > - Allow the VDSO to be randomised on 32-bit. > >> >> > > >> >> > - Many other small features and fixes. > >> >> > >> >> FYI, something in here broke the wireguard test suite, which runs the > >> >> iperf3 networking utility. The full log is here [1], but the relevant > >> >> part > >> >> is: > >> >> > >> >> [+] NS1: iperf3 -Z -t 3 -c 192.168.241.2 > >> >> Connecting to host 192.168.241.2, port 5201 > >> >> iperf3: error - failed to read /dev/urandom: Bad address > >> >> > >> >> I'll see if I can narrow it down a bit more and bisect. But just FYI, in > >> >> case you have an intuition. > >> > > >> > Huh. From iov_iter.c: > >> > > >> > static int copyout(void __user *to, const void *from, size_t n) > >> > { > >> > size_t before = n; > >> > if (should_fail_usercopy()) > >> > return n; > >> > if (access_ok(to, n)) { > >> > instrument_copy_to_user(to, from, n); > >> > n = raw_copy_to_user(to, from, n); > >> > if (n == before) > >> > pr_err("SARU n still %zu pointer is %lx\n", n, > >> > (unsigned long)to); > >> > } > >> > return n; > >> > } > >> > > >> > I added the pr_err() there to catch the failure: > >> > [3.443506] SARU n still 64 pointer is b78db000 > >> > > >> > Also I managed to extract the failing portion of iperf3 into something > >> > smaller: > >> > > >> > int temp; > >> > char *x; > >> > ssize_t l; > >> > FILE *f; > >> > char template[] = "/blah-XX"; > >> > > >> > temp = mkstemp(template); > >> > if (temp < 0) > >> > panic("mkstemp"); > >> > if (unlink(template) < 0) > >> > panic("unlink"); > >> > if (ftruncate(temp, 0x2) < 0) > >> > panic("ftruncate"); > >> > x = mmap(NULL, 0x2, PROT_READ|PROT_WRITE, MAP_PRIVATE, temp, > >> > 0); > >> > if (x == MAP_FAILED) > >> > panic("mmap"); > >> > f = fopen("/dev/urandom", "rb"); > >> > if (!f) > >> > panic("fopen"); > >> > setbuf(f, NULL); > >> > if (fread(x, 1, 0x2, f) != 0x2) > >> > panic("fread"); > >> > >> Does that fail for you reliably? > >> > >> It succeeds for me running under qemu ppce500, though I'm not using your > >> kernel config yet. > > > > Yes, every time without fail, across two systems and two qemu builds. > > OK. Joel worked out that it only fails when built with musl, so that's > why it's succeeding for me (built with glibc). This was independently discovered by several, but we worked out it's because musl uses ftruncate64 here, while glibc doesn't seem to. And ftruncate64 got broken by the syscall wrappers patch on ppc32. The kernel is seeing a 0 length ftruncate call, so the user access sigbuses and can't copy anything. This quick hack gets the test program working again. Only very lightly tested so far... Thanks, Nick --- diff --git a/arch/powerpc/include/asm/syscalls.h b/arch/powerpc/include/asm/syscalls.h index 9840d572da55..9578cc5e4f84 100644 --- a/arch/powerpc/include/asm/syscalls.h +++ b/arch/powerpc/include/asm/syscalls.h @@ -89,6 +89,27 @@ long compat_sys_rt_sigreturn(void); * responsible for combining parameter pairs. */ +#ifdef CONFIG_PPC32 +long sys_ppc_pread64(unsigned int fd, +char __user *ubuf, compat_size_t count, +u32 reg6, u32 pos1, u32 pos2); +long sys_ppc_pwrite64(unsigned int fd, + const char __user *ubuf, compat_size_t count, + u32 reg6, u32 pos1, u32 pos2); +long sys_ppc_readahead(int fd, u32 r4, + u32 offset1, u32 offset2, u32 count); +long sys_ppc_truncate64(const char __user *path, u32 reg4, +
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
"Jason A. Donenfeld" writes: > On Tue, Oct 11, 2022 at 12:53:17PM +1100, Michael Ellerman wrote: >> "Jason A. Donenfeld" writes: >> > On Mon, Oct 10, 2022 at 01:25:25PM -0600, Jason A. Donenfeld wrote: >> >> Hi Michael, >> >> >> >> On Sun, Oct 09, 2022 at 10:01:39PM +1100, Michael Ellerman wrote: >> >> > powerpc updates for 6.1 >> >> > >> >> > - Remove our now never-true definitions for pgd_huge() and p4d_leaf(). >> >> > >> >> > - Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit. >> >> > >> >> > - Add support for syscall wrappers. >> >> > >> >> > - Add support for KFENCE on 64-bit. >> >> > >> >> > - Update 64-bit HV KVM to use the new guest state entry/exit >> >> > accounting API. >> >> > >> >> > - Support execute-only memory when using the Radix MMU (P9 or later). >> >> > >> >> > - Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests. >> >> > >> >> > - Updates to our linker script to move more data into read-only >> >> > sections. >> >> > >> >> > - Allow the VDSO to be randomised on 32-bit. >> >> > >> >> > - Many other small features and fixes. >> >> >> >> FYI, something in here broke the wireguard test suite, which runs the >> >> iperf3 networking utility. The full log is here [1], but the relevant part >> >> is: >> >> >> >> [+] NS1: iperf3 -Z -t 3 -c 192.168.241.2 >> >> Connecting to host 192.168.241.2, port 5201 >> >> iperf3: error - failed to read /dev/urandom: Bad address >> >> >> >> I'll see if I can narrow it down a bit more and bisect. But just FYI, in >> >> case you have an intuition. >> > >> > Huh. From iov_iter.c: >> > >> > static int copyout(void __user *to, const void *from, size_t n) >> > { >> > size_t before = n; >> > if (should_fail_usercopy()) >> > return n; >> > if (access_ok(to, n)) { >> > instrument_copy_to_user(to, from, n); >> > n = raw_copy_to_user(to, from, n); >> > if (n == before) >> > pr_err("SARU n still %zu pointer is %lx\n", n, >> > (unsigned long)to); >> > } >> > return n; >> > } >> > >> > I added the pr_err() there to catch the failure: >> > [3.443506] SARU n still 64 pointer is b78db000 >> > >> > Also I managed to extract the failing portion of iperf3 into something >> > smaller: >> > >> > int temp; >> > char *x; >> > ssize_t l; >> > FILE *f; >> > char template[] = "/blah-XX"; >> > >> > temp = mkstemp(template); >> > if (temp < 0) >> > panic("mkstemp"); >> > if (unlink(template) < 0) >> > panic("unlink"); >> > if (ftruncate(temp, 0x2) < 0) >> > panic("ftruncate"); >> > x = mmap(NULL, 0x2, PROT_READ|PROT_WRITE, MAP_PRIVATE, temp, >> > 0); >> > if (x == MAP_FAILED) >> > panic("mmap"); >> > f = fopen("/dev/urandom", "rb"); >> > if (!f) >> > panic("fopen"); >> > setbuf(f, NULL); >> > if (fread(x, 1, 0x2, f) != 0x2) >> > panic("fread"); >> >> Does that fail for you reliably? >> >> It succeeds for me running under qemu ppce500, though I'm not using your >> kernel config yet. > > Yes, every time without fail, across two systems and two qemu builds. OK. Joel worked out that it only fails when built with musl, so that's why it's succeeding for me (built with glibc). cheers
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
"Jason A. Donenfeld" writes: > On Tue, Oct 11, 2022 at 12:44:20PM +1100, Michael Ellerman wrote: >> "Jason A. Donenfeld" writes: >> > Hi Andrew, >> > >> > On Tue, Oct 11, 2022 at 11:00:15AM +1100, Andrew Donnellan wrote: >> >> Thanks for bisecting, this is interesting! Could you provide your >> >> .config and the environment you're running in? Your reproducer doesn't >> >> seem to trigger it on my baremetal POWER8 pseries_le_defconfig. >> > >> > Sure. >> > >> > .config: https://xn--4db.cc/NemFt2Vs (change CONFIG_INITRAMFS_SOURCE) >> > Toolchain: >> > https://download.wireguard.com/qemu-test/toolchains/20211123/powerpc-linux-musl-cross.tgz >> > >> > You can also just run: >> > >> > ARCH=powerpc make -C tools/testing/selftests/wireguard/qemu -j$(nproc) >> > >> > And that'll assemble the whole thing. >> >> I tried that :) >> >> What host OS are you running that on? >> >> I get: >> >> mkdir -p >> /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc >> powerpc-linux-musl-gcc -o >> /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/init >> -O3 -pipe -std=gnu11 init.c >> >> /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld: >> cannot find Scrt1.o: No such file or directory >> >> /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld: >> cannot find crti.o: No such file or directory >> >> /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld: >> cannot find crtbeginS.o: No such file or directory >> >> /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld: >> cannot find -lgcc >> >> /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld: >> cannot find -lgcc >> collect2: error: ld returned 1 exit status > > Here's what happened: > > - You started the thing and the kernel compile complained about an > unclean tree. > - You ran mrproper. > - You tried to run the thing again. > > amirite? I think so yeah. I tried it on 3 different machines so I'm not sure exactly what I did where, but I definitely ran mrproper on one of them. > If so, what happened is that mrproper deleted the .o files from the > toolchain. Solution: > > ARCH=powerpc make -C tools/testing/selftests/wireguard/qemu -j$(nproc) clean > ARCH=powerpc make -C tools/testing/selftests/wireguard/qemu -j$(nproc) > > Let me know how that goes. Yep that works thanks. And I see the iperf failure. Though I still can't see what the bug is, but hopefully if I stare at it longer I'll work it out. cheers
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
On Tue, Oct 11, 2022 at 12:53:17PM +1100, Michael Ellerman wrote: > "Jason A. Donenfeld" writes: > > On Mon, Oct 10, 2022 at 01:25:25PM -0600, Jason A. Donenfeld wrote: > >> Hi Michael, > >> > >> On Sun, Oct 09, 2022 at 10:01:39PM +1100, Michael Ellerman wrote: > >> > powerpc updates for 6.1 > >> > > >> > - Remove our now never-true definitions for pgd_huge() and p4d_leaf(). > >> > > >> > - Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit. > >> > > >> > - Add support for syscall wrappers. > >> > > >> > - Add support for KFENCE on 64-bit. > >> > > >> > - Update 64-bit HV KVM to use the new guest state entry/exit accounting > >> > API. > >> > > >> > - Support execute-only memory when using the Radix MMU (P9 or later). > >> > > >> > - Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests. > >> > > >> > - Updates to our linker script to move more data into read-only > >> > sections. > >> > > >> > - Allow the VDSO to be randomised on 32-bit. > >> > > >> > - Many other small features and fixes. > >> > >> FYI, something in here broke the wireguard test suite, which runs the > >> iperf3 networking utility. The full log is here [1], but the relevant part > >> is: > >> > >> [+] NS1: iperf3 -Z -t 3 -c 192.168.241.2 > >> Connecting to host 192.168.241.2, port 5201 > >> iperf3: error - failed to read /dev/urandom: Bad address > >> > >> I'll see if I can narrow it down a bit more and bisect. But just FYI, in > >> case you have an intuition. > > > > Huh. From iov_iter.c: > > > > static int copyout(void __user *to, const void *from, size_t n) > > { > > size_t before = n; > > if (should_fail_usercopy()) > > return n; > > if (access_ok(to, n)) { > > instrument_copy_to_user(to, from, n); > > n = raw_copy_to_user(to, from, n); > > if (n == before) > > pr_err("SARU n still %zu pointer is %lx\n", n, > > (unsigned long)to); > > } > > return n; > > } > > > > I added the pr_err() there to catch the failure: > > [3.443506] SARU n still 64 pointer is b78db000 > > > > Also I managed to extract the failing portion of iperf3 into something > > smaller: > > > > int temp; > > char *x; > > ssize_t l; > > FILE *f; > > char template[] = "/blah-XX"; > > > > temp = mkstemp(template); > > if (temp < 0) > > panic("mkstemp"); > > if (unlink(template) < 0) > > panic("unlink"); > > if (ftruncate(temp, 0x2) < 0) > > panic("ftruncate"); > > x = mmap(NULL, 0x2, PROT_READ|PROT_WRITE, MAP_PRIVATE, temp, 0); > > if (x == MAP_FAILED) > > panic("mmap"); > > f = fopen("/dev/urandom", "rb"); > > if (!f) > > panic("fopen"); > > setbuf(f, NULL); > > if (fread(x, 1, 0x2, f) != 0x2) > > panic("fread"); > > Does that fail for you reliably? > > It succeeds for me running under qemu ppce500, though I'm not using your > kernel config yet. Yes, every time without fail, across two systems and two qemu builds. Jason
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
On Tue, Oct 11, 2022 at 12:44:20PM +1100, Michael Ellerman wrote: > "Jason A. Donenfeld" writes: > > Hi Andrew, > > > > On Tue, Oct 11, 2022 at 11:00:15AM +1100, Andrew Donnellan wrote: > >> Thanks for bisecting, this is interesting! Could you provide your > >> .config and the environment you're running in? Your reproducer doesn't > >> seem to trigger it on my baremetal POWER8 pseries_le_defconfig. > > > > Sure. > > > > .config: https://xn--4db.cc/NemFt2Vs (change CONFIG_INITRAMFS_SOURCE) > > Toolchain: > > https://download.wireguard.com/qemu-test/toolchains/20211123/powerpc-linux-musl-cross.tgz > > > > You can also just run: > > > > ARCH=powerpc make -C tools/testing/selftests/wireguard/qemu -j$(nproc) > > > > And that'll assemble the whole thing. > > I tried that :) > > What host OS are you running that on? > > I get: > > mkdir -p > /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc > powerpc-linux-musl-gcc -o > /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/init > -O3 -pipe -std=gnu11 init.c > > /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld: > cannot find Scrt1.o: No such file or directory > > /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld: > cannot find crti.o: No such file or directory > > /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld: > cannot find crtbeginS.o: No such file or directory > > /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld: > cannot find -lgcc > > /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld: > cannot find -lgcc > collect2: error: ld returned 1 exit status Here's what happened: - You started the thing and the kernel compile complained about an unclean tree. - You ran mrproper. - You tried to run the thing again. amirite? If so, what happened is that mrproper deleted the .o files from the toolchain. Solution: ARCH=powerpc make -C tools/testing/selftests/wireguard/qemu -j$(nproc) clean ARCH=powerpc make -C tools/testing/selftests/wireguard/qemu -j$(nproc) Let me know how that goes. Jason
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
"Jason A. Donenfeld" writes: > On Mon, Oct 10, 2022 at 01:25:25PM -0600, Jason A. Donenfeld wrote: >> Hi Michael, >> >> On Sun, Oct 09, 2022 at 10:01:39PM +1100, Michael Ellerman wrote: >> > powerpc updates for 6.1 >> > >> > - Remove our now never-true definitions for pgd_huge() and p4d_leaf(). >> > >> > - Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit. >> > >> > - Add support for syscall wrappers. >> > >> > - Add support for KFENCE on 64-bit. >> > >> > - Update 64-bit HV KVM to use the new guest state entry/exit accounting >> > API. >> > >> > - Support execute-only memory when using the Radix MMU (P9 or later). >> > >> > - Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests. >> > >> > - Updates to our linker script to move more data into read-only sections. >> > >> > - Allow the VDSO to be randomised on 32-bit. >> > >> > - Many other small features and fixes. >> >> FYI, something in here broke the wireguard test suite, which runs the >> iperf3 networking utility. The full log is here [1], but the relevant part >> is: >> >> [+] NS1: iperf3 -Z -t 3 -c 192.168.241.2 >> Connecting to host 192.168.241.2, port 5201 >> iperf3: error - failed to read /dev/urandom: Bad address >> >> I'll see if I can narrow it down a bit more and bisect. But just FYI, in >> case you have an intuition. > > Huh. From iov_iter.c: > > static int copyout(void __user *to, const void *from, size_t n) > { > size_t before = n; > if (should_fail_usercopy()) > return n; > if (access_ok(to, n)) { > instrument_copy_to_user(to, from, n); > n = raw_copy_to_user(to, from, n); > if (n == before) > pr_err("SARU n still %zu pointer is %lx\n", n, > (unsigned long)to); > } > return n; > } > > I added the pr_err() there to catch the failure: > [3.443506] SARU n still 64 pointer is b78db000 > > Also I managed to extract the failing portion of iperf3 into something > smaller: > > int temp; > char *x; > ssize_t l; > FILE *f; > char template[] = "/blah-XX"; > > temp = mkstemp(template); > if (temp < 0) > panic("mkstemp"); > if (unlink(template) < 0) > panic("unlink"); > if (ftruncate(temp, 0x2) < 0) > panic("ftruncate"); > x = mmap(NULL, 0x2, PROT_READ|PROT_WRITE, MAP_PRIVATE, temp, 0); > if (x == MAP_FAILED) > panic("mmap"); > f = fopen("/dev/urandom", "rb"); > if (!f) > panic("fopen"); > setbuf(f, NULL); > if (fread(x, 1, 0x2, f) != 0x2) > panic("fread"); Does that fail for you reliably? It succeeds for me running under qemu ppce500, though I'm not using your kernel config yet. cheers
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
"Jason A. Donenfeld" writes: > Hi Andrew, > > On Tue, Oct 11, 2022 at 11:00:15AM +1100, Andrew Donnellan wrote: >> Thanks for bisecting, this is interesting! Could you provide your >> .config and the environment you're running in? Your reproducer doesn't >> seem to trigger it on my baremetal POWER8 pseries_le_defconfig. > > Sure. > > .config: https://xn--4db.cc/NemFt2Vs (change CONFIG_INITRAMFS_SOURCE) > Toolchain: > https://download.wireguard.com/qemu-test/toolchains/20211123/powerpc-linux-musl-cross.tgz > > You can also just run: > > ARCH=powerpc make -C tools/testing/selftests/wireguard/qemu -j$(nproc) > > And that'll assemble the whole thing. I tried that :) What host OS are you running that on? I get: mkdir -p /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc powerpc-linux-musl-gcc -o /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/init -O3 -pipe -std=gnu11 init.c /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld: cannot find Scrt1.o: No such file or directory /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld: cannot find crti.o: No such file or directory /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld: cannot find crtbeginS.o: No such file or directory /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld: cannot find -lgcc /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld: cannot find -lgcc collect2: error: ld returned 1 exit status cheers
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
Hi Andrew, On Tue, Oct 11, 2022 at 11:00:15AM +1100, Andrew Donnellan wrote: > Thanks for bisecting, this is interesting! Could you provide your > .config and the environment you're running in? Your reproducer doesn't > seem to trigger it on my baremetal POWER8 pseries_le_defconfig. Sure. .config: https://xn--4db.cc/NemFt2Vs (change CONFIG_INITRAMFS_SOURCE) Toolchain: https://download.wireguard.com/qemu-test/toolchains/20211123/powerpc-linux-musl-cross.tgz You can also just run: ARCH=powerpc make -C tools/testing/selftests/wireguard/qemu -j$(nproc) And that'll assemble the whole thing. Jason
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
On Mon, 2022-10-10 at 16:26 -0600, Jason A. Donenfeld wrote: > > Bisected: > > 7e92e01b724526b98cbc7f03dd4afa0295780d56 is the first bad commit > commit 7e92e01b724526b98cbc7f03dd4afa0295780d56 > Author: Rohan McLure > Date: Wed Sep 21 16:56:01 2022 +1000 > > powerpc: Provide syscall wrapper > > Implement syscall wrapper as per s390, x86, arm64. When enabled > cause handlers to accept parameters from a stack frame rather > than > from user scratch register state. This allows for user registers > to be > safely cleared in order to reduce caller influence on speculation > within syscall routine. The wrapper is a macro that emits syscall > handler symbols that call into the target handler, obtaining its > parameters from a struct pt_regs on the stack. > > As registers are already saved to the stack prior to calling > system_call_exception, it appears that this function is executed > more > efficiently with the new stack-pointer convention than with > parameters > passed by registers, avoiding the allocation of a stack frame for > this > method. On a 32-bit system, we see >20% performance increases on > the > null_syscall microbenchmark, and on a Power 8 the performance > gains > amortise the cost of clearing and restoring registers which is > implemented at the end of this series, seeing final result of > ~5.6% > performance improvement on null_syscall. > > Syscalls are wrapped in this fashion on all platforms except for > the > Cell processor as this commit does not provide SPU support. This > can be > quickly fixed in a successive patch, but requires > spu_sys_callback to > allocate a pt_regs structure to satisfy the wrapped calling > convention. > > Co-developed-by: Andrew Donnellan > Signed-off-by: Andrew Donnellan > Signed-off-by: Rohan McLure > Reviewed-by: Nicholas Piggin > [mpe: Make incompatible with COMPAT to retain clearing of high > bits of args] > Signed-off-by: Michael Ellerman > Link: > https://lore.kernel.org/r/20220921065605.1051927-22-rmcl...@linux.ibm.com Thanks for bisecting, this is interesting! Could you provide your .config and the environment you're running in? Your reproducer doesn't seem to trigger it on my baremetal POWER8 pseries_le_defconfig. -- Andrew DonnellanOzLabs, ADL Canberra a...@linux.ibm.com IBM Australia Limited
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
On Mon, Oct 10, 2022 at 02:03:09PM -0600, Jason A. Donenfeld wrote: > On Mon, Oct 10, 2022 at 01:25:25PM -0600, Jason A. Donenfeld wrote: > > Hi Michael, > > > > On Sun, Oct 09, 2022 at 10:01:39PM +1100, Michael Ellerman wrote: > > > powerpc updates for 6.1 > > > > > > - Remove our now never-true definitions for pgd_huge() and p4d_leaf(). > > > > > > - Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit. > > > > > > - Add support for syscall wrappers. > > > > > > - Add support for KFENCE on 64-bit. > > > > > > - Update 64-bit HV KVM to use the new guest state entry/exit accounting > > > API. > > > > > > - Support execute-only memory when using the Radix MMU (P9 or later). > > > > > > - Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests. > > > > > > - Updates to our linker script to move more data into read-only sections. > > > > > > - Allow the VDSO to be randomised on 32-bit. > > > > > > - Many other small features and fixes. > > > > FYI, something in here broke the wireguard test suite, which runs the > > iperf3 networking utility. The full log is here [1], but the relevant part > > is: > > > > [+] NS1: iperf3 -Z -t 3 -c 192.168.241.2 > > Connecting to host 192.168.241.2, port 5201 > > iperf3: error - failed to read /dev/urandom: Bad address > > > > I'll see if I can narrow it down a bit more and bisect. But just FYI, in > > case you have an intuition. > > Huh. From iov_iter.c: > > static int copyout(void __user *to, const void *from, size_t n) > { > size_t before = n; > if (should_fail_usercopy()) > return n; > if (access_ok(to, n)) { > instrument_copy_to_user(to, from, n); > n = raw_copy_to_user(to, from, n); > if (n == before) > pr_err("SARU n still %zu pointer is %lx\n", n, > (unsigned long)to); > } > return n; > } > > I added the pr_err() there to catch the failure: > [3.443506] SARU n still 64 pointer is b78db000 > > Also I managed to extract the failing portion of iperf3 into something > smaller: > > int temp; > char *x; > ssize_t l; > FILE *f; > char template[] = "/blah-XX"; > > temp = mkstemp(template); > if (temp < 0) > panic("mkstemp"); > if (unlink(template) < 0) > panic("unlink"); > if (ftruncate(temp, 0x2) < 0) > panic("ftruncate"); > x = mmap(NULL, 0x2, PROT_READ|PROT_WRITE, MAP_PRIVATE, temp, 0); > if (x == MAP_FAILED) > panic("mmap"); > f = fopen("/dev/urandom", "rb"); > if (!f) > panic("fopen"); > setbuf(f, NULL); > if (fread(x, 1, 0x2, f) != 0x2) > panic("fread"); > > Jason Bisected: 7e92e01b724526b98cbc7f03dd4afa0295780d56 is the first bad commit commit 7e92e01b724526b98cbc7f03dd4afa0295780d56 Author: Rohan McLure Date: Wed Sep 21 16:56:01 2022 +1000 powerpc: Provide syscall wrapper Implement syscall wrapper as per s390, x86, arm64. When enabled cause handlers to accept parameters from a stack frame rather than from user scratch register state. This allows for user registers to be safely cleared in order to reduce caller influence on speculation within syscall routine. The wrapper is a macro that emits syscall handler symbols that call into the target handler, obtaining its parameters from a struct pt_regs on the stack. As registers are already saved to the stack prior to calling system_call_exception, it appears that this function is executed more efficiently with the new stack-pointer convention than with parameters passed by registers, avoiding the allocation of a stack frame for this method. On a 32-bit system, we see >20% performance increases on the null_syscall microbenchmark, and on a Power 8 the performance gains amortise the cost of clearing and restoring registers which is implemented at the end of this series, seeing final result of ~5.6% performance improvement on null_syscall. Syscalls are wrapped in this fashion on all platforms except for the Cell processor as this commit does not provide SPU support. This can be quickly fixed in a successive patch, but requires spu_sys_callback to allocate a pt_regs structure to satisfy the wrapped calling convention. Co-developed-by: Andrew Donnellan Signed-off-by: Andrew Donnellan Signed-off-by: Rohan McLure Reviewed-by: Nicholas Piggin [mpe: Make incompatible with COMPAT to retain clearing of high bits of args] Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20220921065605.1051927-22-rmcl...@linux.ibm.com arch/powerpc/Kconfig | 1 + arch/powerpc/include/asm/syscall.h | 4 +++ arch/powerpc/include/asm/syscall_wrapper.h | 51
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
On Mon, Oct 10, 2022 at 01:25:25PM -0600, Jason A. Donenfeld wrote: > Hi Michael, > > On Sun, Oct 09, 2022 at 10:01:39PM +1100, Michael Ellerman wrote: > > powerpc updates for 6.1 > > > > - Remove our now never-true definitions for pgd_huge() and p4d_leaf(). > > > > - Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit. > > > > - Add support for syscall wrappers. > > > > - Add support for KFENCE on 64-bit. > > > > - Update 64-bit HV KVM to use the new guest state entry/exit accounting > > API. > > > > - Support execute-only memory when using the Radix MMU (P9 or later). > > > > - Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests. > > > > - Updates to our linker script to move more data into read-only sections. > > > > - Allow the VDSO to be randomised on 32-bit. > > > > - Many other small features and fixes. > > FYI, something in here broke the wireguard test suite, which runs the > iperf3 networking utility. The full log is here [1], but the relevant part > is: > > [+] NS1: iperf3 -Z -t 3 -c 192.168.241.2 > Connecting to host 192.168.241.2, port 5201 > iperf3: error - failed to read /dev/urandom: Bad address > > I'll see if I can narrow it down a bit more and bisect. But just FYI, in > case you have an intuition. Huh. From iov_iter.c: static int copyout(void __user *to, const void *from, size_t n) { size_t before = n; if (should_fail_usercopy()) return n; if (access_ok(to, n)) { instrument_copy_to_user(to, from, n); n = raw_copy_to_user(to, from, n); if (n == before) pr_err("SARU n still %zu pointer is %lx\n", n, (unsigned long)to); } return n; } I added the pr_err() there to catch the failure: [3.443506] SARU n still 64 pointer is b78db000 Also I managed to extract the failing portion of iperf3 into something smaller: int temp; char *x; ssize_t l; FILE *f; char template[] = "/blah-XX"; temp = mkstemp(template); if (temp < 0) panic("mkstemp"); if (unlink(template) < 0) panic("unlink"); if (ftruncate(temp, 0x2) < 0) panic("ftruncate"); x = mmap(NULL, 0x2, PROT_READ|PROT_WRITE, MAP_PRIVATE, temp, 0); if (x == MAP_FAILED) panic("mmap"); f = fopen("/dev/urandom", "rb"); if (!f) panic("fopen"); setbuf(f, NULL); if (fread(x, 1, 0x2, f) != 0x2) panic("fread"); Jason
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
Hi Michael, On Sun, Oct 09, 2022 at 10:01:39PM +1100, Michael Ellerman wrote: > powerpc updates for 6.1 > > - Remove our now never-true definitions for pgd_huge() and p4d_leaf(). > > - Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit. > > - Add support for syscall wrappers. > > - Add support for KFENCE on 64-bit. > > - Update 64-bit HV KVM to use the new guest state entry/exit accounting API. > > - Support execute-only memory when using the Radix MMU (P9 or later). > > - Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests. > > - Updates to our linker script to move more data into read-only sections. > > - Allow the VDSO to be randomised on 32-bit. > > - Many other small features and fixes. FYI, something in here broke the wireguard test suite, which runs the iperf3 networking utility. The full log is here [1], but the relevant part is: [+] NS1: iperf3 -Z -t 3 -c 192.168.241.2 Connecting to host 192.168.241.2, port 5201 iperf3: error - failed to read /dev/urandom: Bad address I'll see if I can narrow it down a bit more and bisect. But just FYI, in case you have an intuition. Jason [1] https://build.wireguard.com/linux/4de65c5830233e7a4adf2e679510089ec4e210c7/powerpc.log
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag
The pull request you sent on Sun, 09 Oct 2022 22:01:39 +1100: > https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git > tags/powerpc-6.1-1 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/4899a36f91a9f9b06878471096bd143e7253006d Thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/prtracker.html