2012/10/16 Andrew Morton <a...@linux-foundation.org>: > On Mon, 15 Oct 2012 19:38:46 +0800 > Qing Z <njumi...@gmail.com> wrote: > >> >> atomic_notifier_call_chain(&panic_notifier_list, 0, buf); >> >> >> >> + /* >> >> + * Unlock the console anyway here, in case it's occupied by another >> >> + * one which has no chance to unlock the console thus prevents the >> >> + * panic log prints on the console. >> >> + */ >> >> + console_unlock(); >> >> + >> >> bust_spinlocks(0); >> >> >> >> if (!panic_blink) >> > >> > hm. console_unlock() does a large amount of work, and it seems risky >> > to do all of that when the system is in a bad-and-getting-worse state. >> > >> > Is there some more modest thing we could do here, for example, >> > >> > if (console_locked) { >> > up(&console_sem); >> > console_locked = 0; >> > } >> > >> > or something along those lines? >> > >> > Also, perhaps this operation should be moved into bust_spinlocks(). >> > What would have happened if your code had triggered an oops, rather >> > than called panic()? >> > >> > >> Hi Andrew, >> Thanks for your reply! >> For your question" What would have happened if your code had >> triggered an oops, rather than called panic()?", actually we found the >> issue when trigger an oops. When we call FBIOPAN_DISPLAY in >> ./drivers/video/fbmem.c, it will first lock console, if we trigger an >> oops before unlock console, the issue happen. It also exist when call >> panic() directly in the same case. It is a common issue for panic >> process. >> I have two options for solution: >> 1. I agree with your suggestion that add some modest thing in >> bust_spinlocks(), bust_spinlocks() is supposed to clear spinlocks >> which prevent oops information from reaching the user. But it didn't >> clear console_sem. We can add codes that clear console_sem. >> 1) add up(&console_sem) in bust_spinlocks(0). >> It will be risky in case that no printk after >> bust_spinlocks(0) in >> panic(), because no console_unlock() to print log out. >> 2) call console_unlock()in bust_spinlocks(0). >> For bust_spinlocks(0), console_unblank() is used to flush oops >> to >> mtdoops console(commit: b61312d353da1871778711040464b10f5cd904df). >> Logically, if panic without the issue, console_unlock is called after >> couples of console_lock and console_unlock; if panic with the issue, >> will it be risky call console_unlock() in console_unblank() after >> console_lock()? >> 2. Moreover, there is another option. We can also add protect codes >> in vprintk(), vprintk() just cover the cases that two cores' log >> interleave when panic and printk recurse itself. We can add all cases' >> protection here. Actually the original vprintk() don___t have the issue, >> but after the patch(commit: fe21773d655c2c64641ec2cef499289ea175c817) >> which fix two cores' log interleave issue , the issue is not covered. >> I add a flag after panic_smp_self_stop() in panic(), and check the >> flag, if flag is set, vprintk will call zap_locks(), I have tested the >> option, the issue also disappear. >> What do you think? > > The #1 priority is to get the oops message reliably delivered. > > That means we should avoid console_unlock() on the oops path: it's far > too complicated and risks deadlocks, re-oopses, recursion, etc. > > If there was text queued in the console layer and that text fails to be > emitted, well, that's sad, but it's more important that the oops > message be displayed. > > If the oops trace is occasionally interleaved with other text then > that's sad too, but at least the info we need is readable. Oopses > inside console_lock() are rare. > > > So I'd suggest that the code in bust_spinlocks(1) should simply do > whatever needs to be done to make the forthcoming oops trace be > visible, and leave it at that - don't bother trying to flush out any > old text. > > > Also, we should be careful with things like up() on a semaphore which > hasn't been down()ed. Because under some Kconfig combinations, such an > operation might trigger debugging traces and we could get into a big > mess. (An up() on non-down()ed semaphore is actually an OK operation, > so this was a bad example. But you see the problem). > Hi Andrew, Thanks for your suggestions. I am e newbie, may not state the issue clearly. Let me state it again: In ./drivers/video/fbmem.c, codes below cause issues: case FBIOPAN_DISPLAY: ... console_lock(); ret = fb_pan_display(info, &var); console_unlock(); ... break;
issue case 1: 1. core 0 call console_lock(); 2. panic; ... 4. panic process done. Result: all panic log won't be printed. issue case 2: 1. core 0 panic; 2. core 1 call console_lock(); 3. core 0 call smp_send_stop(), core1 stop; 4. core 0 panic process done. Result: only little top part of panic log will be printed. My soluiton according to your suggestions: As you said, the first priority is to get oops message reliably delivered. I think we needn't care about console_sem when panic, just make sure we print the log imediately, so add sema_init(&console_sem,1) in bust_spinlocks(0), just like zap_locks() do. It is safer than console_unlock() or up(). We can't add sema_init(..) in bust_spinlocks(1) due to issue case2, although the condition is rare. About issue case 2: should we avoid call console_lock() when panic? If we init console_sem in panic, old text may be flushed too, but should be before panic oops message. Also we can fix it by updating con_start("con_start = log_end") once panic happen, only log after panic will be printed. What do you think? Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/