On Tue 06-05-14 14:12:34, Will Deacon wrote: > On Tue, May 06, 2014 at 01:29:58PM +0100, Jan Kara wrote: > > On Tue 06-05-14 13:06:48, Will Deacon wrote: > > > On Fri, May 02, 2014 at 11:46:51PM +0100, Jan Kara wrote: > > > > I really dislike this patch. It goes completely against my efforts of > > > > lowering irq latency caused by printing to console (which are the > > > > problems I have observed ;). > > > > > > Hmmm, what makes you think that? Interrupts only remain disabled whilst we > > > process the backlog, which in the usual case should be pretty small. We > > > also > > > hold the logbuf_lock during this time, so we can't get stuck in an > > > unbounded > > > loop. > > > > > > Can you elaborate a bit more on the problems you've observed, please? > > Well, with serial console the backlog can get actually pretty big. During > > boot on large machines I've seen CPUs stuck in that very loop in > > console_unlock() for tens of seconds. Obviously that causes problems - e.g. > > watchdog fires, RCU lockup detector fires, when interrupts are disabled, > > some hardware gives up because its interrupts weren't served for too long. > > All in all the machine just dies. > > Right, so there's the usual compromise here between throughput and latency. I'd see that compromise if enabling & disabling interrupts would be taking considerable amount of time. I don't think that was your concern, was it? Maybe I just misunderstood you...
> I hoped that pegging the amount of irq-disabled work using console_end_seq > would give us the best of both worlds, but you're saying that it can still > lead to issues on large machines (in the worst case, the whole buffer fills > up while we're printing the last lot, so the irq-disabled work amounts to > draining logbuf, right?). Yup. > That said, printing one message each time seems to go too far in the > opposite direction for my liking, so the best bet is likely to limit the > work to some fixed number of messages. Do you have any feeling for such a > limit? If you really are concerned about enabling and disabling of interrupts taking significant time (and it may be, I just don't know), then printing couple of messages without enabling them makes sense. How many is a tricky question since it depends on the console speed. I had a similar problem when I was deciding in my patch when we should ask another CPU to take over printing from the current CPU (to avoid the issues I've described in the previous email). I was experimenting with various stuff but in the end I restorted to a stupid "after X characters are printed". > > And the backlog builds up because while one cpu is doing the printing in > > console_unlock() all the other cpus are busily adding new messages to the > > buffer faster than they can be printed... > > Understood, but that's also the situation without this patch (and not one > that I think you can fix without hurting latency). Sure. I have a patch which transitions printing to another CPU once in a while so single CPU isn't hogged for too long and that solves the issues I have observed. But Alan didn't like this solution so the issue is unfixed for now. > > > > My opinion is that when you are printing from each and every interrupt > > > > which happens so often, then you have a problem and disabling IRQs in > > > > printk so that your interrupt doesn't happen that often seems like a > > > > poor > > > > solution to me. You could as well just ratelimit your debug messages, > > > > couldn't you? > > > > > > My use-case was basically using printk as a debug trace during early boot > > > when bringing up Linux on a new CPU core. I don't think ratelimiting would > > > be the right thing there, since I really did want as many messages to > > > reach the console as possible (which is also why I wrote this patch, not > > > just the other one in the series). > > OK, I understand. It just seems wrong to me to throttle all interrupts on > > the cpu while doing printing just because someone might be doing debug > > printing from the interrupt. Sure it's fine as a debug hack but on a > > production machine that seems rather counterproductive. > > Perhaps, but the one time I *really* want printk to be reliable is when I'm > using it to debug a problem. Yes, I understand that. Honza -- Jan Kara <j...@suse.cz> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/