On Tue, 29 Jan 2013 15:54:24 +0100 Jan Kara <j...@suse.cz> wrote: > > So I was testing the attached patch which does what we discussed. The bad > > news is I was able to trigger a situation (twice) when suddently sda > > disappeared and thus all IO requests failed with EIO. There is no trace of > > what's happened in the kernel log. I'm guessing that disabled interrupts on > > the printing CPU caused scsi layer to time out for some request and fail the > > device. So where do we go from here? > Andrew? I guess this fell off your radar via the "hrm, strange, need to > have a closer look later" path?
urgh. I was hoping that if we left it long enough, one of both of us would die :( I fear we will rue the day when we changed printk() to bounce some of its work up to a kernel thread. > Currently I'd be inclined to return to my original solution... Can we make it smarter? Say, take a peek at the current softlockup/nmi-watchdog intervals, work out how for how long we can afford to keep interrupts disabled and then use that period and sched_clock() to work out if we're getting into trouble? IOW, remove the hard-wired "1000" thing which will always be too high or too low for all situations. Implementation-wise, that would probably end up adding a kernel-wide function along the lines of /* * Return the maximum number of nanosecond for which interrupts may be disabled * on the current CPU */ u64 max_interrupt_disabled_duration(void) { return min(sortirq duration, nmi watchdog duration); } Thinking ahead... Other kernel sites which know they can disable interrupts for a long time can perhaps use this. Later, realtimeish systems (for example machine controllers) might want to add a kernel tunable so they can set the max_interrupt_disabled_duration() return value much lower. To make that more accurate, we could add per-cpu, per-irq variables to record sched_clock() when each CPU enters the interrupt, so the comment becomes /* * Return the remaining maximum number of nanosecond for which interrupts may * be disabled on the current CPU */ This may all be crazy and hopefully we'll never do it, but the design should permit such things from day one if practical. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/