On 7/27/07, Alan Cox <[EMAIL PROTECTED]> wrote: > > Maybe I should resurrect it & send it out...
Hmm, something that hooks in not only at do_IRQ time (as the present in-mainline stackoverflow check thing)? > > (FWIW I think I recall that the warning itself sometimes tipped the > > scales enough on 4k stacks to bring the box down) > > You can always switch stack for the printk and it probably should panic > at that point and give a trace then die as that is what we are trying to > prove does not occur Yes, only yesterday I saw exactly this happening DEBUG_STACKOVERFLOW when doing a udf -> pktcdvd -> cdrom -> ide_cd thing. It's one of those reproducible will-crash-4k-stacks tests, especially if you have debug stuff enabled in your build that would make on-stack structures (where such exist on the codepath) a bit heavier. Admittedly, what seems to have happened is a bit pathological: [ 481.836378] cdrom: entering cdrom_count_tracks [ 481.844266] BUG: sleeping function called from invalid context at include/asm/semaphore.h:98 [ 481.844434] do_IRQ: stack overflow: 164 [ 481.844540] [<c0405cfe>] show_trace_log_lvl+0x19/0x2e [ 481.844707] [<c0405dfe>] show_trace+0x12/0x14 [ 481.844867] [<c0405e14>] dump_stack+0x14/0x16 [ 481.845027] [<c0406ff6>] do_IRQ+0x7b/0xe1 [ 481.845186] [<c040583e>] common_interrupt+0x2e/0x34 [ 481.845348] [<c042b8e7>] printk+0x1b/0x1d [ 481.845507] [<c0422c05>] __might_sleep+0x81/0xdc [ 481.845668] [<c066d869>] __reacquire_kernel_lock+0x2d/0x4f [ 481.845833] [<c066b09b>] schedule+0x78a/0x7a4 [ 481.845996] [<c066b538>] wait_for_completion+0x72/0x97 [ 481.846160] [<c05937a6>] ide_do_drive_cmd+0xeb/0x109 [ 481.846324] [<f89172a2>] cdrom_queue_packet_command+0x40/0xc5 [ide_cd] [ 481.846503] [<f89175b7>] ide_cdrom_packet+0x86/0xa4 [ide_cd] [ 481.846669] [<f8854dc1>] cdrom_get_disc_info+0x48/0x87 [cdrom] [ 481.846839] [<f8854ec6>] cdrom_get_last_written+0x2a/0xfe [cdrom] [ 481.847009] [<f891831b>] cdrom_read_toc+0x39d/0x3f3 [ide_cd] [ 481.847231] [<f8918e7e>] ide_cdrom_audio_ioctl+0x130/0x1ce [ide_cd] [ 481.847414] [<f8854123>] cdrom_count_tracks+0x5c/0x126 [cdrom] [ 481.847583] [<f8855688>] cdrom_open+0x147/0x79c [cdrom] [ 481.847748] [<f891799a>] idecd_open+0x75/0x8a [ide_cd] [ 481.847912] [<c04aac0e>] do_open+0x1d1/0x284 [ 481.848079] [<c04aad89>] __blkdev_get+0x73/0x7e [ 481.848242] [<c04aada9>] blkdev_get+0x15/0x17 [ 481.848411] [<f8b34b6b>] pkt_open+0x99/0xc6e [pktcdvd] [ 481.848583] [<c04aaad3>] do_open+0x96/0x284 [ 481.848745] [<c04aad89>] __blkdev_get+0x73/0x7e [ 481.848910] [<c04aada9>] blkdev_get+0x15/0x17 (... the trace cut off there, and then the box froze hard, no sysrq ...) The mount(2) hit the wait_for_completion() in ide_do_drive_cmd(), little stack was left at this point. But then I have no idea why the __reacquire_kernel_lock() from schedule() gave a might_sleep() there, the code in sched.c and kernel_lock.c looks obviously correct -- the down(&kernel_sem) only happens with both irqs and preemption on. Anyway, the second line of printk() in __might_sleep (the one that tells us in_atomic() and irqs_disabled()) was about to be printed when an interrupt decided to join the fun. do_IRQ() comes in, with debug stackoverflows on, it notices that only 164 bytes worth of stack is left and decides to dump_stack ... and while we were doing just that, we died. (this was 2.6.23-rc1-mm1) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/