On 6/28/07, Oleg Nesterov <[EMAIL PROTECTED]> wrote:
(trimmed the cc: list) On 06/28, Satyam Sharma wrote: > On 6/27/07, Oleg Nesterov <[EMAIL PROTECTED]> wrote: > > > >Contrary, I believe we should avoid signals when it > >comes to kernel threads. > > And I agree, but there's quite a subtle difference between signals being > used like they normally are, and this case here. Fact is, there is simply > no other way to break out of certain functions (if there was, I would've > preferred that myself). > > In fact, what I'm proposing is that kthreads should *not* be tinkering > with (flushing, handling, dequeueing, whatever) signals at all, because > like you mentioned, if they do that, it's possible that the TIF_SIGPENDING > could get lost.But we do have kthreads which call dequeue_signal(). And perhaps some kthread treats SIGKILL as "urgent exit with a lot of printks" while kthread_should_stop() means "exit gracefully". [..] I believe kthread_stop() should not send the signal. Just because it could be actually dequeued by kthread, and it may have some special meaning for this kthread. [...] Hmm... actually, such a change breaks the while (signal_pending(current)) dequeue_signal_and_so_something(); loop, see jffs2_garbage_collect_thread() for example. In short, I think that kthread_stop() + TIF_SIGPENDING should be a special case, the signal should be sent explicitly before kthread_stop(), like cifs does.
The existence of such kthreads (that dabble in signals, although I guess everybody here agrees kernel threads have no business dabbling in them) is the *only* reason I didn't submit my proposal as an actual patch :-) Perhaps "one day" we'll clean up all such cases, and fix up kthread semantics to simply outlaw signal-handling in kernel threads (I just can't convince myself there could be a good justifiable reason to do that). When that's done, and seeing that kthreadd_setup() does ignore_signals(), kthread_stop() could become simply force_sig() ...
[...] This is what I can't understand completely. Why should we check SIGKILL or signal_pending() in addition to kthread_stop_info.k, what is the point?
... so kthread_stop_info will go away too.
After all, the caller of kthread_stop(k) should know what and why k does.
One, the kthread code could change over time. Two, the solution to the problem is un-intuitive for the user to do. The API should know such cases, and handle them well too. The caller of kthread_stop() would expect it to do the expected, after all (i.e. stop the kthread :-)
In any case, that kind of the changes can break things, just because this means API change.
Well, kthreads are a kernel-internal API anyway, I guess as long as we fix all users, we're fine.
> >I am talking about the case > >when wait_event_interruptible() should not fail (unless something bad > >happens) inside the "while (!kthread_should_stop())" loop. > >Note also that kthread could use TASK_INTERRUPTIBLE sleep > >[...] and because it knows that all signals are ignored. > > Ok, I think you're saying that if a kthread used wait_event_interruptible > (and was not expecting signals, because it ignores them), then bad > things (say exit in inconsistent state, etc) will happen if we break that > wait_event_interruptible unexpectedly. No. Of course, kthread should check the error and doesn't exit in inconsistent state. The question is: why should we break (say) tcp_recvmsg() inside "while (!kthread_should_stop())" loop if it is supposed to succeed unless something bad happens? (I mean, we may have a kthread which doesn't expect the failure unless something bad happens).
First, I don't quite understand this "not expecting failure" / "something bad happens" bit at all. Second, we *must* break that tcp_recvmsg() inside the kthread's main loop, of course! We want it stopped, after all, and if we don't make it "break" out of that function, the kthread _will_never_exit_. [ What's worse, several kthreads are part of modules, and are created during the module_init() and so the module_exit() function needs to call kthread_stop() to clean it up. But the wait_for_completion() in kthread_stop() will never unblock, and so this would also mean the rmmod will _hang_ and module will never get unloaded too. ]
OK, let me give a silly example. The correctly written kthread should check the result of kmalloc(), yes? kthread(k) means that k should exit anyway, yes? So let's change kthread_stop(k) to also set TIF_MEMDIE, this means that __alloc_pages() will fail quickly when get_page_from_freelist() doesn't succeed, but won't start pageout() which may take a while. Please don't explain me why this suggestion is bad :), I am just trying to illustrate my point.
Your example / analogy is indeed *very* bad :-) Please note that this whole thing is about functions that will _simply_*never*_exit_ever_ _unless_ given a signal. [ I think you've been misunderstanding my proposal that I want to send a SIGKILL / signal to the kthread just to ensure it gets stopped / killed "fast" or "asap" etc ... No! The only reason I got into this was because kthread_stop() is an API that fails to do what it should be doing if the corresponding kthread simply happens to be executing certain functions in the kernel that will *never* breakout unless given a signal. ] But finally, I do agree that kthreads already exist out there who want to dabble in signals and we can't break them, so perhaps the signal method for kthread_stop() will have to wait for now. [ BTW even there we're safe as long as we check kthread_stop() _before_ flushing or dequeueing the signals, but then that'll be an ugly rule to try and enforce, of course. ] Satyam - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

