On Fri, 2014-09-05 at 00:47 -0700, Luis R. Rodriguez wrote: > On Fri, Sep 5, 2014 at 12:19 AM, Tejun Heo <t...@kernel.org> wrote: > > On Thu, Sep 04, 2014 at 11:37:24PM -0700, Luis R. Rodriguez wrote: > > ... > >> + /* > >> + * I got SIGKILL, but wait for 60 more seconds for completion > >> + * unless chosen by the OOM killer. This delay is there as a > >> + * workaround for boot failure caused by SIGKILL upon device > >> + * driver initialization timeout. > >> + * > >> + * N.B. this will actually let the thread complete regularly, > >> + * wait_for_completion() will be used eventually, the 60 > >> second > >> + * try here is just to check for the OOM over that time. > >> + */ > >> + WARN_ONCE(!test_thread_flag(TIF_MEMDIE), > >> + "Got SIGKILL but not from OOM, if this issue is on > >> probe use .driver.async_probe\n"); > >> + for (i = 0; i < 60 && !test_thread_flag(TIF_MEMDIE); i++) > >> + if (wait_for_completion_timeout(&done, HZ)) > >> + goto wait_done; > >> + > > > > Ugh... Jesus, this is way too hacky, so now we fail on 90s timeout > > instead of 30? > > Nope! I fell into the same trap and only with tons of patience by part > of Tetsuo with me was I able to grok that the 60 seconds here are not > for increasing the timeout, this is just time spent checking to ensure > that the OOM wasn't the one who triggered the SIGKILL. Even if the > drivers took eons it should be fine now, I tried it :D > > > Why do we even need this with the proposed async > > probing changes? > > Ah -- well without it the way we "find" drivers that need this new > "async feature" is by a bug report and folks saying their system can't > boot, or they say their device doesn't come up. That's all. Tracing > this to systemd and a timeout was one of the most ugliest things ever. > There two insane bug reports you can go check: > > mptsas was the first: > > http://article.gmane.org/gmane.linux.kernel/1669550 > https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1297248
<quote> (2) Currently systemd-udevd unconditionally sends SIGKILL upon hardcoded 30 seconds timeout. As a result, finit_module() of mptsas kernel module receives SIGKILL when waiting for error handler thread to be started. </quote> Hm. Why is this not a systemd-udevd bug for running around killing stuff when it has no idea whether progress is being made or not? -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html