On 2021-01-22, Sven Schnelle <[email protected]> wrote: > I'm seeing crashes on s390x with this patch while running the glibc > testsuite. The glibc test suite triggers a few FPU exceptions which > are printed to the kernel log by default. Looking at the crash dump, > i see that the console_drivers pointer seems to be overwritten while > printing that (user space) warning: > > crash> print *console_drivers > $1 = { > name = "\247\071\377\373\354!\004\214\000\331\300\345\000\033\353_", > write = 0xa7190001eb119078, > read = 0xe6e320b0050090, > device = 0xa51e0010a7210001, > unblank = 0xa7291000b9e28012, > setup = 0xe320f0a00004e320, > exit = 0x208e0094eb112000, > match = 0xcec1c006e017f, > flags = -4984, > index = 133, > cflag = 8143136, > data = 0x800458108004ec12, > next = 0x5007eaf000000 > } > > crash> x/16i console_drivers > 0x79009700: lghi %r3,-5 > 0x79009704: aghik %r2,%r1,1164 > 0x7900970a: brasl %r14,0x79386dc8 > 0x79009710: lghi %r1,1 > 0x79009714: laog %r1,%r1,120(%r9) > 0x7900971a: llgc %r2,5(%r11) > 0x79009720: llilh %r1,16 > 0x79009724: tmll %r2,1 > 0x79009728: lghi %r2,4096 > 0x7900972c: locgre %r1,%r2 > 0x79009730: lg %r2,160(%r15) > 0x79009736: llc %r2,142(%r2) > 0x7900973c: srlg %r1,%r1,0(%r2) > 0x79009742: clij %r1,1,12,0x7900981e > 0x79009748: cgij %r8,0,8,0x79009852 > 0x7900974e: la %r2,4(%r8) > > crash> sym 0x79009700 > 79009700 (t) iomap_finish_ioend+192 > /usr/src/debug/kernel-5.10.fc33/linux-5.11.0-20210122.rc4.git0.9f29bd8b2e71.300.fc33.s390x/./include/linux/pagemap.h: > 59 > > So somehow the pointer for console_drivers changes. > > I can't provide the normal kernel crash output as printk is no longer > working,
I don't understand what you mean here. The crash tool can dump the printk buffer. I would be curious to see what the messages look like. It would also be helpful to know the last message you saw on the console. > but in crash the backtrace looks like this: > crash> bt > PID: 859915 TASK: dad24000 CPU: 9 COMMAND: "ld.so.1" > LOWCORE INFO: > -psw : 0x0002000180000000 0x0000000078c8400e > -function : stop_run at 78c8400e > -prefix : 0x0041e000 > -cpu timer: 0x7ffff953a957e166 > -clock cmp: 0xd92876863ac66b00 > -general registers: > 0xf0f0f0f000000000 0x0000000078c8400e > 0x0000000079984830 0x0000000079984830 > 0x0000000079ac62c0 0x0000000000000002 > 0x0000000000000038 000000000000000000 > 0x000000007a19157e 000000000000000000 > 0x00000000fffffffa 000000000000000000 > 0x00000000dad24000 000000000000000000 > 0x0000000078c8416e 0x00000380033136f0 > -access registers: > 0x7d5b4da0 0xa2a795c0 0000000000 0000000000 > 0000000000 0000000000 0000000000 0000000000 > 0000000000 0000000000 0000000000 0000000000 > 0000000000 0000000000 0000000000 0000000000 > -control registers: > 0x00a0000014966a10 0x000000007a15c007 > 0x000000000001d0a0 000000000000000000 > 0x000000000000ffff 0x000000000001d080 > 0x000000003b000000 0x00000001f29b81c7 > 0x0000000000008000 000000000000000000 > 000000000000000000 000000000000000000 > 000000000000000000 0x000000007a15c007 > 0x00000000db000000 0x000000000001d0c0 > -floating point registers: > 0x5faaaaabeb67f92f 0x0007fffffffffff8 > 0x40400000eb67f000 000000000000000000 > 0x0000000000000007 0x0000006400000000 > 0x000002aa2a63f028 0x000000000963cf85 > 0x000000002a642c60 000000000000000000 > 0x7f85516ceb67f222 0x000003ffeb67f528 > 0x000003ffeb67f230 0x0000000100000000 > 0x000003ffeb67f21d 0x000002aa28d2d970 > > #0 [38003313710] stop_run at 78c8400e > #1 [38003313728] atomic_notifier_call_chain at 78ced536 > #2 [38003313778] panic at 7974aeca > #3 [38003313818] die at 78c897ea > #4 [38003313880] do_no_context at 78c9b230 > #5 [380033138b8] pgm_check_handler at 7976438c > PSW: 0404c00180000000 0000000078d304ba (kmsg_dump_rewind+290) > GPRS: 0000000000000020 0000000000000000 0000000000000000 00000000ffffec88 > 000000000000000e 00000000793f4800 0000000000000224 0000000000000000 > 0000000000000000 0000000000000000 0000000000000224 0005007eaf000000 > 00000000dad24000 0000000000000000 0000038003313a68 0000038003313a18 > #0 [38003313a68] console_unlock at 78d3158e > #1 [38003313b48] vprintk_emit at 78d32f50 > #2 [38003313ba8] vprintk_default at 78d32f9e > #3 [38003313c00] printk at 7974bfae > #4 [38003313c90] show_code at 78c8588e > #5 [38003313de0] show_registers at 7974a26e > #6 [38003313e70] show_regs at 78c8961e > #7 [38003313ea0] pgm_check_handler at 7976438c > PSW: 0705300080000000 000000007d7983b2 (user space) > GPRS: 0000000000000000 000000007d798360 0000000000000010 000003ff00000004 > 000000007f854dd0 000000007d7e9788 000003ff00401b30 000000007f854ff8 > 000003ff00000000 000003ff7d5b4da0 000003ff00000000 000003ff00000010 > 000003ff004022c6 000003ff004024b0 0000000080401796 000000007f854de8 > > kmsg_dump_rewind+290 is: > > if (!(con->flags & CON_ENABLED)) > continue; > > in kernel/printk/printk.c:1845 > > I haven't dived into whether our show_code() is doing something wrong > which was covered in the past or whether that's because of the patch > above but wanted to make you aware of that in case you have an idea. > Otherwise i have to dig into the code. Unless we are dealing with very long printk messages that normally get truncated (800+ characters) this patch simply adds a string terminator. I do not see how that could possibly cause this kind of damage. When this triggers, there is nothing happening with consoles registering or deregistering, right? The string terminator (kernel/printk/printk.c:1402) is only added for paranoia. If you comment that out, this patch will have no effect (for "normal" length lines). I would be curious if that somehow makes a difference for you. John Ogness

