On Friday, May 10, 2013 11:13:27 PM Colin Cross wrote: > On Fri, May 10, 2013 at 2:28 PM, Zoran Markovic > <zoran.marko...@linaro.org> wrote: > > From: Benoit Goby <ben...@android.com> > > > > Below is a patch from android kernel that detects a driver suspend/resume > > lockup and captures dump in the kernel log. Please review and provide > > comments. > > This paragraph should go below the --- line so it doesn't end up in > the final commit message. > > > Rather than hard-lock the kernel, dump the suspend/resume thread stack and > > BUG() when a driver takes too long to suspend/resume.
And how exactly is that different from hanging the kernel? > > The timeout is set to 12 seconds to be longer than the usbhid 10 > > second timeout. Which is kind of arbitrary. > > Exclude from the watchdog the time spent waiting for children that > > are resumed asynchronously and time every device, whether or not they > > resumed synchronously. What about changing the code to use wait_for_completion_timeout() instead of wait_for_completion() and actually trying to recover if one of them times out? [It could try to unregister the device in question and if *that* hangs indefinitely, *then* commit a panic() or something similar, but if it succeeds, abort the ongoing suspend or complete the resume without that device.] Of course, that would involve changing things not to depend on async, but might be better than slapping a timer on top. > > This patch is targeted for mobile devices where a suspend/resume lockup > > could cause a system reboot and catch user's attention. Information > > about failing device can later be retrieved from captured log in > > subsequent boot session. > > I would take out the phrase "catch user's attention", the intention of > this patch is actually the opposite - get the system back to working > normally as fast as possible, while still putting enough information > to debug the problem into the log. > > > The hardware watchdog timer is likely suspended during this time and > > couldn't be relied upon. The soft-lockup detector would eventually tell > > that tasks are not scheduled, but would provide little context as to why. > > The patch hence uses system timer and assumes it is still active while the > > devices are suspended/resumed. I must say I'm not particularly impressed by this patch series. I understand the motivation, but I'm wondering if that's the best we can do. Thanks, Rafael -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/