On Tue, 22 Aug 2023 07:31:54 +0000 Christophe Leroy <christophe.le...@csgroup.eu> wrote:
> Le 18/08/2023 à 18:23, Erhard Furtner a écrit : > > On Fri, 18 Aug 2023 15:47:38 +0000 > > Christophe Leroy <christophe.le...@csgroup.eu> wrote: > > > >> I'm wondering if the problem is just linked to the kernel being built > >> with CONFIG_SMP or if it is the actual startup of a secondary CPU that > >> cause the freeze. > >> > >> Please leave the btext_unmap() in place because I think it is important > >> to keep it, and start the kernel with the following parameter: > >> > >> nr_cpus=1 > > > > With btext_unmap() back and place and nr_cpus=1 set the freeze still > > happens after the 1st btext_unmap:129 on cold boots: > > > > [ 0.000000] printk: bootconsole [udbg0] enabled > > [ 0.000000] Total memory = 2048MB; using 4096kB for hash table > > [ 0.000000] mapin_ram:125 > > [ 0.000000] mmu_mapin_ram:169 0 30000000 1400000 2000000 > > [ 0.000000] __mmu_mapin_ram:146 0 1400000 > > [ 0.000000] __mmu_mapin_ram:155 1400000 > > [ 0.000000] __mmu_mapin_ram:146 1400000 30000000 > > [ 0.000000] __mmu_mapin_ram:155 20000000 > > [ 0.000000] __mapin_ram_chunk:107 20000000 30000000 > > [ 0.000000] __mapin_ram_chunk:117 > > [ 0.000000] mapin_ram:134 > > [ 0.000000] kasan_mmu_init:129 > > [ 0.000000] kasan_mmu_init:132 0 > > [ 0.000000] kasan_mmu_init:137 > > [ 0.000000] btext_unmap:129 > > > > Thanks, > > Can you replace the call to btext_unmap() by a call to btext_map() at > the end of MMU_init() ? > > If that gives no interesting result, can you leave the call to > btext_unmap() and add a call to btext_map() at the very begining of > function start_kernel() in init/main.c (You may have to add a include of > asm/btext.h) > > With that I hope we can see more stuff. Ok, I tested out both methods. 1.) Replace btext_unmap() with btext_map() at the end of MMU_init(). Warm boot again is unspectacular (attached). On cold boots I sometimes get: printk: bootconsole [udbg0] enabled Total memory = 2048MB; using 4096kB for hash table mapin_ram:125 mmu_mapin_ram:169 0 30000000 1400000 2000000 __mmu_mapin_ram:146 0 1400000 __mmu_mapin_ram:155 1400000 __mmu_mapin_ram:146 1400000 30000000 __mmu_mapin_ram:155 20000000 __mapin_ram_chunk:107 20000000 30000000 __mapin_ram_chunk:117 mapin_ram:134 kasan_mmu_init:129 kasan_mmu_init:132 0 kasan_mmu_init:137 ioremap() called early from btext_map+0x64/0xdc. Use early_ioremap() instead Linux version 6.5.0-rc7-PMacG4-dirty (root@T1000) (gcc (Gentoo 12.3.1_p20230526 p2) 12.3.1 20230526, GNU ld (Gentoo 2.40 p7) 2.40.0) #4 SMP Wed Aug 23 12:59:11 CEST 2023 which shows one line (Linux version...) more than before. Most of the time I get this more interesting output however: kasan_mmu_init:129 kasan_mmu_init:132 0 kasan_mmu_init:137 Linux version 6.5.0-rc7-PMacG4-dirty (root@T1000) (gcc (Gentoo 12.3.1_p20230526 p2) 12.3.1 20230526, GNU ld (Gentoo 2.40 p7) 2.40.0) #4 SMP Wed Aug 23 12:59:11 CEST 2023 KASAN init done list_add corruption. prev->next should be next (c17100c0), but was 2c030000. (prev=c036ac7c). ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:30! ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at arch/powerpc/include/asm/machdep.h:227 die+0xd8/0x39c Modules linked in: CPU: 0 PID: 0 Comm: swapper Tainted: G T ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎irty⊥q NIP: c0012c64 LR: c0012c58 CTR: 00000000 REGS: c1717d10 TRAP: 0700 Tainted: G T (∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎irty⊥q) MSR: 00021032 <ME,IR,DR,RI> CR: 24000484 XER: 00000000 GPR00: 00000000 c1717dc0 c1551c40 00000000 00000000 00000000 00000000 00000000 GPR08: 00000000 00000000 00000000 00000000 00000000 00000000 00dd6f30 021f6e90 GPR16: 021f69b0 02201994 00dd6f3c efff3190 00000000 c1717f10 c1455dd8 c11ec6c0 GPR24: 00001032 c0fab540 c1717e10 00000005 c1740000 c1740000 c1746380 c1555a20 NIP [c0012c64] die+0xd8/0x39c LR [c0012c58] die+0xcc/0x39c Call Trace: [c1717dc0] [c0012c58] die+0xcc/0x39c (unreliable) [c1717e00] [c00047f0] ProgramCheck_virt+0x100/0x150 --- interrupt: 700 at __list_add_valid+0xe8/0x120 NIP: c0854ca0 LR: c0854ca0 CTR: 00000000 REGS: c1717e10 TRAP: 0700 Tainted: G T (∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎irty⊥q) MSR: 00021032 <ME,IR,DR,RI> CR: 24000488 XER: 00000000 GPR00: 00000000 c1717ec0 c1551c40 0000005d 00000000 00000000 00000000 00000000 GPR08: 00000000 00000000 00000000 00000000 00000000 00000000 00dd6f30 021f6e90 GPR16: 021f69b0 02201994 00dd6f3c efff3190 00000000 c1717f10 c1455dd8 c11ec6c0 GPR24: f82e2fde c11ec680 fefefefe c11ec700 effec7a8 c036ac7c c036ac7c c17100c0 NIP [c0854ca0] __list_add_valid+0xe8/@x120 LR [c0854ca0] __list_add_valid+0xe8/@x120 --- interrupt: 700 [c1717ee8] [c0c18764] of_alias_scan+0x330/0x44c [c1717f70] [c140b0fc] setup_arch+0x78/0x44c [c1717fc0] [c14045b0] start_kernel+0x78/0x2d8 [c1717ff0] [000035d0] 0x35d0 Code: 3fa0c174 915f0060 39290001 913e0040 480f602d 38600001 488189f1 387db620 4835654d 813db620 2c090000 40820008 <0fe00000> 80de0080 3fa0c0fb 3ee0c172 ---[ end trace 0000000000000000 ]--- Oops: Exception in kernel mode, sig: 5 [#1] BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 Modules linked in: CPU: 0 PID: 0 Comm: swapper Tainted: G T ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎irty⊥q NIP: c0854ca0 LR: c0854ca0 CTR: 00000000 REGS: c1717e10 TRAP: 0700 Tainted: G T (∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎irty⊥q) MSR: 00021032 <ME,IR,DR,RI> CR: 24000488 XER: 00000000 GPR00: 00000000 c1717ec0 c1551c40 0000005d 00000000 00000000 00000000 00000000 GPR08: 00000000 00000000 00000000 00000000 00000000 00000000 00dd6f30 021f6e90 GPR16: 021f69b0 02201994 00dd6f3c efff3190 00000000 c1717f10 c1455dd8 c11ec6c0 GPR24: f82e2fde c11ec680 fefefefe c11ec700 effec7a8 c036ac7c c036ac7c c17100c0 NIP [c0854ca0] __list_add_valid+0xe8/@x120 LR [c0854ca0] __list_add_valid+0xe8/0x120 Call Trace: [c1717ec0] [c0854ca0] __list_add_valid+0xe8/0x120 (unreliable) [c1717ee0] [c0c18764] of_alias_scan+0x330/0x44c [c1717f70] [c140b0fc] setup_arch+0x78/0x44c [c1717fc0] [c14045b0] start_kernel+0x78/0x2d8 [c1717ff0] [000035d0] 0x35d0 Code: 7fc5f378 38636060 7f84e378 38630200 4b8b9ec9 0fe00000 3c68c110 7fa6eb78 7fe4fb78 38630180 4b8b9ead <0fe00000> 3c60c110 7fe6fb78 7fa5eb78 ---[ end trace 0000000000000000 ]--- 2.) Add btext_map() at the very begining of function start_kernel() in init/main.c: On cold boots I sometimes get: printk: bootconsole [udbg0] enabled Total memory = 2048MB; using 4096kB for hash table mapin_ram:125 mmu_mapin_ram:169 0 30000000 1400000 2000000 __mmu_mapin_ram:146 0 1400000 __mmu_mapin_ram:155 1400000 __mmu_mapin_ram:146 1400000 30000000 __mmu_mapin_ram:155 20000000 __mapin_ram_chunk:107 20000000 30000000 __mapin_ram_chunk:117 mapin_ram:134 kasan_mmu_init:129 kasan_mmu_init:132 0 kasan_mmu_init:137 btext_unmap:129 btext_unmap:131 Linux version 6.5.0-rc7-PMacG4-dirty (root@T1000) (gcc (Gentoo 12.3.1_p20230526 p2) 12.3.1 20230526, GNU ld (Gentoo 2.40 p7) 2.40.0) #5 SMP Wed Aug 23 13:59:00 CEST 2023 which shows one line (Linux version...) more than before. Most of the time I get this more interesting output however: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at arch/powerpc/include/asm/machdep.h:227 die+0xd8/0x39c Modules linked in: BUG: Kernel NULL pointer dereference on read at 0x00000050 Faulting instruction address: 0xc014e3bc Thread overran stack, or stack corrupted ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at arch/powerpc/include/asm/machdep.h:227 die+0xd8/0x39c Modules linked in: BUG: Kernel NULL pointer dereference on read at 0x00000050 Faulting instruction address: 0xc014e3bc Thread overran stack, or stack corrupted [...] Repeated 10-11 times. In both cases I needed to transcribe the dmesg from a picture I took from the screen + OCR. Hope the numbers are correct. Regards, Erhard
dmesg_65-rc7_g4_00
Description: Binary data