Re: Linux 3.19-rc5
On Fri, Feb 6, 2015 at 8:39 AM, Peter Zijlstra wrote: > > I like the macro one though; I worry a wee bit about non-documented > cases through. If someone is doing something way subtle we'll break it > :/ Yeah, I agree. And I wonder how much we should even care. People who do this thing by hand - rather than using the wait-event model - are *likely* legacy stuff or just random drivers, or simply stuff that isn't all that performance-sensitive. So I dunno how worthwhile this all is. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Fri, Feb 06, 2015 at 08:02:41AM -0800, Linus Torvalds wrote: > On Fri, Feb 6, 2015 at 3:50 AM, Peter Zijlstra wrote: > > > > Also, set_current_state(TASK_RUNNING) is almost always pointless, nobody > > cares about that barrier, so make it go away. > > I'd rather not mix this with the patch, and wonder if we should just > do that globally with some preprocessor magic. We do have a fair > number of "set_current_state(TASK_RUNNING)" 138 > and at least for the > *documented* reason for the memory barrier, all of them could/should > be barrier-less. > > So something like > > if (__is_constant_p(state) && state == TASK_RUNNING) > tsk->state = state; > else > set_mb(tsk->state, state); > > might be more general solution than randomly doing one at a time when > changing code around it.. Yeah, or we could do the coccinelle thing and do a mass conversion. I like the macro one though; I worry a wee bit about non-documented cases through. If someone is doing something way subtle we'll break it :/ --- Subject: sched: Avoid the full memory-barrier for set_current_state(TASK_RUNNING) One should never need the full memory barrier implied by set_current_state() to set TASK_RUNNING for the documented reason of avoiding races against wakeup. Suggested-by: Linus Torvalds Signed-off-by: Peter Zijlstra (Intel) --- diff --git a/include/linux/sched.h b/include/linux/sched.h index 8db31ef98d2f..aea44c4eeed8 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -243,6 +243,27 @@ extern char ___assert_task_state[1 - 2*!!( ((task->state & TASK_UNINTERRUPTIBLE) != 0 && \ (task->flags & PF_FROZEN) == 0) +/* + * set_current_state() includes a barrier so that the write of current->state + * is correctly serialised wrt the caller's subsequent test of whether to + * actually sleep: + * + * set_current_state(TASK_UNINTERRUPTIBLE); + * if (do_i_need_to_sleep()) + * schedule(); + * + * If the caller does not need such serialisation then use + * __set_current_state(). This is always true for TASK_RUNNING since + * there is no race against wakeup -- both write the same value. + */ +#define ___set_current_state(state)\ +do { \ + if (__is_constant_p(state) && (state) == TASK_RUNNING) \ + current->state = (state); \ + else\ + set_mb(current->state, (state));\ +} while (0) + #ifdef CONFIG_DEBUG_ATOMIC_SLEEP #define __set_task_state(tsk, state_value) \ @@ -256,17 +277,6 @@ extern char ___assert_task_state[1 - 2*!!( set_mb((tsk)->state, (state_value));\ } while (0) -/* - * set_current_state() includes a barrier so that the write of current->state - * is correctly serialised wrt the caller's subsequent test of whether to - * actually sleep: - * - * set_current_state(TASK_UNINTERRUPTIBLE); - * if (do_i_need_to_sleep()) - * schedule(); - * - * If the caller does not need such serialisation then use __set_current_state() - */ #define __set_current_state(state_value) \ do {\ current->task_state_change = _THIS_IP_; \ @@ -275,7 +285,7 @@ extern char ___assert_task_state[1 - 2*!!( #define set_current_state(state_value) \ do {\ current->task_state_change = _THIS_IP_; \ - set_mb(current->state, (state_value)); \ + ___set_current_state(state_value); \ } while (0) #else @@ -285,21 +295,10 @@ extern char ___assert_task_state[1 - 2*!!( #define set_task_state(tsk, state_value) \ set_mb((tsk)->state, (state_value)) -/* - * set_current_state() includes a barrier so that the write of current->state - * is correctly serialised wrt the caller's subsequent test of whether to - * actually sleep: - * - * set_current_state(TASK_UNINTERRUPTIBLE); - * if (do_i_need_to_sleep()) - * schedule(); - * - * If the caller does not need such serialisation then use __set_current_state() - */ #define __set_current_state(state_value) \ do { current->state = (state_value); } while (0) #define set_current_state(state_value) \ - set_mb(current->state, (state_value)) + ___set_current_state(state_value); #endif -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Fri, Feb 6, 2015 at 3:50 AM, Peter Zijlstra wrote: > > Also, set_current_state(TASK_RUNNING) is almost always pointless, nobody > cares about that barrier, so make it go away. I'd rather not mix this with the patch, and wonder if we should just do that globally with some preprocessor magic. We do have a fair number of "set_current_state(TASK_RUNNING)" and at least for the *documented* reason for the memory barrier, all of them could/should be barrier-less. So something like if (__is_constant_p(state) && state == TASK_RUNNING) tsk->state = state; else set_mb(tsk->state, state); might be more general solution than randomly doing one at a time when changing code around it.. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Thu, Feb 05, 2015 at 10:14:36PM +0100, Bruno Prémont wrote: > The loop is now replaced by a single WARN() trace - I guess expected: > From my reading of the thread fixing pccardd/sched TASK_RUNNING usage/check > is another issue left for the future. Yeah, something like the below will make it go away -- under the assumption that that comment is actually correct, I don't know, the pcmcia people should probably write a better comment :/ Also, set_current_state(TASK_RUNNING) is almost always pointless, nobody cares about that barrier, so make it go away. --- drivers/pcmcia/cs.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/pcmcia/cs.c b/drivers/pcmcia/cs.c index 5292db69c426..5678e161a17d 100644 --- a/drivers/pcmcia/cs.c +++ b/drivers/pcmcia/cs.c @@ -635,6 +635,12 @@ static int pccardd(void *__skt) skt->sysfs_events = 0; spin_unlock_irqrestore(>thread_lock, flags); + /* +* Supposedly this is a rarely contended mutex and +* sleeping is therefore unlikely, the occasional +* extra loop iteration is harmless. +*/ + sched_annotate_sleep(); mutex_lock(>skt_mutex); if (events & SS_DETECT) socket_detect_change(skt); @@ -679,7 +685,7 @@ static int pccardd(void *__skt) try_to_freeze(); } /* make sure we are running before we exit */ - set_current_state(TASK_RUNNING); + __set_current_state(TASK_RUNNING); /* shut down socket, if a device is still present */ if (skt->state & SOCKET_PRESENT) { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Thu, Feb 05, 2015 at 10:14:36PM +0100, Bruno Prémont wrote: The loop is now replaced by a single WARN() trace - I guess expected: From my reading of the thread fixing pccardd/sched TASK_RUNNING usage/check is another issue left for the future. Yeah, something like the below will make it go away -- under the assumption that that comment is actually correct, I don't know, the pcmcia people should probably write a better comment :/ Also, set_current_state(TASK_RUNNING) is almost always pointless, nobody cares about that barrier, so make it go away. --- drivers/pcmcia/cs.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/pcmcia/cs.c b/drivers/pcmcia/cs.c index 5292db69c426..5678e161a17d 100644 --- a/drivers/pcmcia/cs.c +++ b/drivers/pcmcia/cs.c @@ -635,6 +635,12 @@ static int pccardd(void *__skt) skt-sysfs_events = 0; spin_unlock_irqrestore(skt-thread_lock, flags); + /* +* Supposedly this is a rarely contended mutex and +* sleeping is therefore unlikely, the occasional +* extra loop iteration is harmless. +*/ + sched_annotate_sleep(); mutex_lock(skt-skt_mutex); if (events SS_DETECT) socket_detect_change(skt); @@ -679,7 +685,7 @@ static int pccardd(void *__skt) try_to_freeze(); } /* make sure we are running before we exit */ - set_current_state(TASK_RUNNING); + __set_current_state(TASK_RUNNING); /* shut down socket, if a device is still present */ if (skt-state SOCKET_PRESENT) { -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Fri, Feb 6, 2015 at 3:50 AM, Peter Zijlstra pet...@infradead.org wrote: Also, set_current_state(TASK_RUNNING) is almost always pointless, nobody cares about that barrier, so make it go away. I'd rather not mix this with the patch, and wonder if we should just do that globally with some preprocessor magic. We do have a fair number of set_current_state(TASK_RUNNING) and at least for the *documented* reason for the memory barrier, all of them could/should be barrier-less. So something like if (__is_constant_p(state) state == TASK_RUNNING) tsk-state = state; else set_mb(tsk-state, state); might be more general solution than randomly doing one at a time when changing code around it.. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Fri, Feb 06, 2015 at 08:02:41AM -0800, Linus Torvalds wrote: On Fri, Feb 6, 2015 at 3:50 AM, Peter Zijlstra pet...@infradead.org wrote: Also, set_current_state(TASK_RUNNING) is almost always pointless, nobody cares about that barrier, so make it go away. I'd rather not mix this with the patch, and wonder if we should just do that globally with some preprocessor magic. We do have a fair number of set_current_state(TASK_RUNNING) 138 and at least for the *documented* reason for the memory barrier, all of them could/should be barrier-less. So something like if (__is_constant_p(state) state == TASK_RUNNING) tsk-state = state; else set_mb(tsk-state, state); might be more general solution than randomly doing one at a time when changing code around it.. Yeah, or we could do the coccinelle thing and do a mass conversion. I like the macro one though; I worry a wee bit about non-documented cases through. If someone is doing something way subtle we'll break it :/ --- Subject: sched: Avoid the full memory-barrier for set_current_state(TASK_RUNNING) One should never need the full memory barrier implied by set_current_state() to set TASK_RUNNING for the documented reason of avoiding races against wakeup. Suggested-by: Linus Torvalds torva...@linux-foundation.org Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org --- diff --git a/include/linux/sched.h b/include/linux/sched.h index 8db31ef98d2f..aea44c4eeed8 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -243,6 +243,27 @@ extern char ___assert_task_state[1 - 2*!!( ((task-state TASK_UNINTERRUPTIBLE) != 0 \ (task-flags PF_FROZEN) == 0) +/* + * set_current_state() includes a barrier so that the write of current-state + * is correctly serialised wrt the caller's subsequent test of whether to + * actually sleep: + * + * set_current_state(TASK_UNINTERRUPTIBLE); + * if (do_i_need_to_sleep()) + * schedule(); + * + * If the caller does not need such serialisation then use + * __set_current_state(). This is always true for TASK_RUNNING since + * there is no race against wakeup -- both write the same value. + */ +#define ___set_current_state(state)\ +do { \ + if (__is_constant_p(state) (state) == TASK_RUNNING) \ + current-state = (state); \ + else\ + set_mb(current-state, (state));\ +} while (0) + #ifdef CONFIG_DEBUG_ATOMIC_SLEEP #define __set_task_state(tsk, state_value) \ @@ -256,17 +277,6 @@ extern char ___assert_task_state[1 - 2*!!( set_mb((tsk)-state, (state_value));\ } while (0) -/* - * set_current_state() includes a barrier so that the write of current-state - * is correctly serialised wrt the caller's subsequent test of whether to - * actually sleep: - * - * set_current_state(TASK_UNINTERRUPTIBLE); - * if (do_i_need_to_sleep()) - * schedule(); - * - * If the caller does not need such serialisation then use __set_current_state() - */ #define __set_current_state(state_value) \ do {\ current-task_state_change = _THIS_IP_; \ @@ -275,7 +285,7 @@ extern char ___assert_task_state[1 - 2*!!( #define set_current_state(state_value) \ do {\ current-task_state_change = _THIS_IP_; \ - set_mb(current-state, (state_value)); \ + ___set_current_state(state_value); \ } while (0) #else @@ -285,21 +295,10 @@ extern char ___assert_task_state[1 - 2*!!( #define set_task_state(tsk, state_value) \ set_mb((tsk)-state, (state_value)) -/* - * set_current_state() includes a barrier so that the write of current-state - * is correctly serialised wrt the caller's subsequent test of whether to - * actually sleep: - * - * set_current_state(TASK_UNINTERRUPTIBLE); - * if (do_i_need_to_sleep()) - * schedule(); - * - * If the caller does not need such serialisation then use __set_current_state() - */ #define __set_current_state(state_value) \ do { current-state = (state_value); } while (0) #define set_current_state(state_value) \ - set_mb(current-state, (state_value)) + ___set_current_state(state_value); #endif -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Fri, Feb 6, 2015 at 8:39 AM, Peter Zijlstra pet...@infradead.org wrote: I like the macro one though; I worry a wee bit about non-documented cases through. If someone is doing something way subtle we'll break it :/ Yeah, I agree. And I wonder how much we should even care. People who do this thing by hand - rather than using the wait-event model - are *likely* legacy stuff or just random drivers, or simply stuff that isn't all that performance-sensitive. So I dunno how worthwhile this all is. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Thu, 29 January 2015 Linus Torvalds wrote: > On Thu, Jan 29, 2015 at 5:12 PM, Linus Torvalds wrote: > > > > The WARN() was already changed to a WARN_ONCE(). > > Oh, but I notice that the "__set_current_state(TASK_RUNNING) ends up > always happening. > > So I think the right fix is to: > > - warn once like we do > > - but *not* do that __set_current_state() which was always total crap anyway > > Why do I say "total crap"? Because of two independent issues: > > (a) it actually changes behavior for a debug vs non-debug kernel, > which is a really bad idea to begin with > > (b) it's really wrong. The whole "nested sleep" case was never a > major bug to begin with, just a possible inefficiency where constant > nested sleeps would possibly make the outer sleep not sleep. But that > "could possibly make" case was the unlikely case, and the debug patch > made it happen *all* the time by explicitly setting things running. > > So I think the proper patch is the attached. > > The comment is also crap. The comment says > > "Blocking primitives will set (and therefore destroy) current->state > [...]" > > but the reality is that they *may* set it, and only in the unlikely > slow-path where they actually block. > > So doing this in "__may_sleep()" is just bogus and horrible horrible > crap. It turns the "harmless ugliness" into a real *harmful* bug. The > key word of "__may_sleep()" is that "MAY" part. It's a debug thing to > make relatively rare cases show up. > > PeterZ, please don't make "debugging" patches like this. Ever again. > Because this was just stupid, and it took me too long to realize that > despite the warning being shut up, the debug patch was still actively > doing bad bad things. > > Ingo, maybe you'd want to apply this through the scheduler tree, the > way you already did the WARN_ONCE() thing. > > Bruno, does this finally actually fix your pccard thing? Tested the variant that was applied by running rc7 and it fixes the endless loop. The loop is now replaced by a single WARN() trace - I guess expected: [3.083647] [ cut here ] [3.087477] WARNING: CPU: 0 PID: 67 at /usr/src/linux-git/kernel/sched/core.c:7300 __might_sleep+0x79/0x90() [3.091357] do not call blocking ops when !TASK_RUNNING; state=1 set at [] pccardd+0xa0/0x3e0 [3.095232] Modules linked in: [3.099020] CPU: 0 PID: 67 Comm: pccardd Not tainted 3.19.0-rc7-3-g67288c4 #17 [3.102760] Hardware name: Acer TravelMate 660/TravelMate 660, BIOS 3A19 01/14/2004 [3.106504] c212def4 c212def4 c212deb4 c16caf23 c212dee4 c10416fd c1907334 c212df10 [3.110315] 0043 c1907380 1c84 c105a099 c105a099 c1442040 0001 f5f54bc0 [3.114143] c212defc c104176e 0009 c212def4 c1907334 c212df10 c212df30 c105a099 [3.117960] Call Trace: [3.121703] [] dump_stack+0x16/0x18 [3.125447] [] warn_slowpath_common+0x7d/0xc0 [3.129172] [] ? __might_sleep+0x79/0x90 [3.132868] [] ? __might_sleep+0x79/0x90 [3.136500] [] ? pccardd+0xa0/0x3e0 [3.140092] [] warn_slowpath_fmt+0x2e/0x30 [3.143657] [] __might_sleep+0x79/0x90 [3.147209] [] ? pccardd+0xa0/0x3e0 [3.150747] [] ? pccardd+0xa0/0x3e0 [3.154256] [] mutex_lock+0x17/0x2a [3.157734] [] pccardd+0xe9/0x3e0 [3.161207] [] ? pcmcia_socket_uevent+0x30/0x30 [3.164660] [] kthread+0xa0/0xc0 [3.168059] [] ret_from_kernel_thread+0x20/0x30 [3.171436] [] ? kthread_worker_fn+0x140/0x140 [3.174796] ---[ end trace c3f708b642e3d8f0 ]--- >From my reading of the thread fixing pccardd/sched TASK_RUNNING usage/check is another issue left for the future. Thanks, Bruno > Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Thu, 29 January 2015 Linus Torvalds wrote: On Thu, Jan 29, 2015 at 5:12 PM, Linus Torvalds wrote: The WARN() was already changed to a WARN_ONCE(). Oh, but I notice that the __set_current_state(TASK_RUNNING) ends up always happening. So I think the right fix is to: - warn once like we do - but *not* do that __set_current_state() which was always total crap anyway Why do I say total crap? Because of two independent issues: (a) it actually changes behavior for a debug vs non-debug kernel, which is a really bad idea to begin with (b) it's really wrong. The whole nested sleep case was never a major bug to begin with, just a possible inefficiency where constant nested sleeps would possibly make the outer sleep not sleep. But that could possibly make case was the unlikely case, and the debug patch made it happen *all* the time by explicitly setting things running. So I think the proper patch is the attached. The comment is also crap. The comment says Blocking primitives will set (and therefore destroy) current-state [...] but the reality is that they *may* set it, and only in the unlikely slow-path where they actually block. So doing this in __may_sleep() is just bogus and horrible horrible crap. It turns the harmless ugliness into a real *harmful* bug. The key word of __may_sleep() is that MAY part. It's a debug thing to make relatively rare cases show up. PeterZ, please don't make debugging patches like this. Ever again. Because this was just stupid, and it took me too long to realize that despite the warning being shut up, the debug patch was still actively doing bad bad things. Ingo, maybe you'd want to apply this through the scheduler tree, the way you already did the WARN_ONCE() thing. Bruno, does this finally actually fix your pccard thing? Tested the variant that was applied by running rc7 and it fixes the endless loop. The loop is now replaced by a single WARN() trace - I guess expected: [3.083647] [ cut here ] [3.087477] WARNING: CPU: 0 PID: 67 at /usr/src/linux-git/kernel/sched/core.c:7300 __might_sleep+0x79/0x90() [3.091357] do not call blocking ops when !TASK_RUNNING; state=1 set at [c1442040] pccardd+0xa0/0x3e0 [3.095232] Modules linked in: [3.099020] CPU: 0 PID: 67 Comm: pccardd Not tainted 3.19.0-rc7-3-g67288c4 #17 [3.102760] Hardware name: Acer TravelMate 660/TravelMate 660, BIOS 3A19 01/14/2004 [3.106504] c212def4 c212def4 c212deb4 c16caf23 c212dee4 c10416fd c1907334 c212df10 [3.110315] 0043 c1907380 1c84 c105a099 c105a099 c1442040 0001 f5f54bc0 [3.114143] c212defc c104176e 0009 c212def4 c1907334 c212df10 c212df30 c105a099 [3.117960] Call Trace: [3.121703] [c16caf23] dump_stack+0x16/0x18 [3.125447] [c10416fd] warn_slowpath_common+0x7d/0xc0 [3.129172] [c105a099] ? __might_sleep+0x79/0x90 [3.132868] [c105a099] ? __might_sleep+0x79/0x90 [3.136500] [c1442040] ? pccardd+0xa0/0x3e0 [3.140092] [c104176e] warn_slowpath_fmt+0x2e/0x30 [3.143657] [c105a099] __might_sleep+0x79/0x90 [3.147209] [c1442040] ? pccardd+0xa0/0x3e0 [3.150747] [c1442040] ? pccardd+0xa0/0x3e0 [3.154256] [c16cf447] mutex_lock+0x17/0x2a [3.157734] [c1442089] pccardd+0xe9/0x3e0 [3.161207] [c1441fa0] ? pcmcia_socket_uevent+0x30/0x30 [3.164660] [c1056990] kthread+0xa0/0xc0 [3.168059] [c16d1040] ret_from_kernel_thread+0x20/0x30 [3.171436] [c10568f0] ? kthread_worker_fn+0x140/0x140 [3.174796] ---[ end trace c3f708b642e3d8f0 ]--- From my reading of the thread fixing pccardd/sched TASK_RUNNING usage/check is another issue left for the future. Thanks, Bruno Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
* Peter Zijlstra wrote: > On Thu, Jan 29, 2015 at 05:25:07PM -0800, Linus Torvalds wrote: > > > PeterZ, please don't make "debugging" patches like this. Ever > > again. Because this was just stupid, and it took me too long > > to realize that despite the warning being shut up, the debug > > patch was still actively doing bad bad things. > > Understood. It was my fault too - I should have realized this side effect of the new debugging facility when I merged it, which side effect, as Linus pointed it out, is a really bad idea. We are generally pretty good at doing transparent debugging in the scheduler and in the locking code, but screwed up this time around :-( Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
* Peter Zijlstra pet...@infradead.org wrote: On Thu, Jan 29, 2015 at 05:25:07PM -0800, Linus Torvalds wrote: PeterZ, please don't make debugging patches like this. Ever again. Because this was just stupid, and it took me too long to realize that despite the warning being shut up, the debug patch was still actively doing bad bad things. Understood. It was my fault too - I should have realized this side effect of the new debugging facility when I merged it, which side effect, as Linus pointed it out, is a really bad idea. We are generally pretty good at doing transparent debugging in the scheduler and in the locking code, but screwed up this time around :-( Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Sun, Feb 01, 2015 at 12:09:32PM -0800, Linus Torvalds wrote: > Now, I have the patch that removes that thing (but I was hoping to get > it from the scheduler tree before doing rc7, which seems to not have > happened), but yes, that together with your patch seems like it should > fix all the nasty bug-inducing crud where the "debugging helpers" end > up silently changing core process state. > > I'll just combine it with yours to avoid extra noise in this area, and > mark you as the author, fixing *both* of the incorrect state changes. > Ok? Ah I see it in your tree; I was about to suggest: -# define sched_annotate_sleep()__set_current_state(TASK_RUNNING) +# define sched_annotate_sleep()do { current->task_state_change = 0; } while (0) Instead of the assignment, which has a rvalue. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Sat, Jan 31, 2015 at 01:54:36PM -0800, Linus Torvalds wrote: > On Sat, Jan 31, 2015 at 12:16 PM, Peter Zijlstra wrote: > > > > All the stuff it flagged are genuinely wrong, albeit not disastrously > > so, things mostly just work. > > I really disagree. > > They weren't wrong. They *could* occasionally result in extra > reschedules, but that was never wrong to begin with. > > But the debugging code made that much much worse, and made that "could > result in extra reschedules" happen all the time if it triggered. Yes, that was a bad choice. Again, sorry about that. > So you say "genuinely wrong", and I say "but that's how things were > designed - it's an optimistic approach, not an exact one". Your > debugging code changed that behavior, and actually introduced a real > bug, exactly because you felt that the "no nested sleeps" was a harder > requirement than it has ever actually been. > > In other words, I think the debugging code itself is wrong, and then > that sched_annotate_sleep() thing is just a symptom of how it is > wrong. If you have to sprinkle these kinds of random workarounds in a > few core scheduler places (ok, mainly "wait()" paths, it looks like), > why would you expect random *drivers* to have to care about things > that even core kernel code says "I'm not going to care about this, > I'll just shut the warning up, because the warning is wrong". I was not aware this pattern was so wide spread / accepted. I have totally underestimated the amount of fallout here. > So don't get me wrong - I think the whole "add debug code to find > places where we might have issues" was well worth it, and resulted in > improvements. > > But once the low-hanging fruit and the core code that everybody hits > has been fixed, and people cannot apparently even be bothered with the > other cases it finds (like the pccardd case), at that point the value > of the debug code becomes rather less obvious. My bad (again) for not paying proper attention there. No excuses for that. > The pccardd example is an example of legacy use of our old and > original semantics of how the whole nested sleep was supposed to work. > And it *does* work. It's not a bug. It's how things have worked time > immemorial, and clearly nobody is really willing to bother with > changing working - but legacy - cardbus code. And at that point, I > think the debug code is actually *wrong*, and causes more problems > than it "fixes". > > And debug code that causes more problems that it fixes should either > be removed, or improved to the point where the problems go away. > > The "improved" part might be about saying "it's actually perfectly > _fine_ to have nested sleeps, as long as it is truly rare that the > nested sleep actually sleeps". And thus make the debug code really > test that it's *rare*, rather than test that it never happens. Warn if > it happens more than a couple of times a second for any particular > process, or something like that. So the thing that makes it work is the fact that schedule is called in a loop, without that loop things come apart very quickly. Now most core primitives have this loop, and are safe from spurious wakeups -- and I think we can class this under that. But there's heaps of legacy code that has non looping calls of schedule and really breaks with spurious wakeups (I ran into that a few years ago when I made schedule() return 'randomly' for some other reason). So I worry about separating nesting sleep in loops, which are mostly harmless vs the non-loop case which is really bad. FWIW the bug that started all this was someone calling blocking functions from io_schedule(): io_schedule() -> blk_flush_plug() -> block-layer-magic -> device-magic() -> mutex_lock(). And mutex_lock(() used: if (need_resched()) schedule(); for preemption and would never return (because !TASK_RUNNING). Now, we've since fixed the mutex code to not assume TASK_RUNNING, so we won't actually trigger the reported 'deadlock' anymore. In any case, I can certainly make the warning go away for now and try again later with a (hopefully) more intelligent version. On a related note, would it be possible to make sparse/coccinelle try and warn on broken wait primitives? ie, no loop around schedule(), no conditional after set_current_state()? Examples: drivers/atm/atmtcp.c- while (test_bit(flag,>flags) == old_test) { drivers/atm/atmtcp.c- mb(); drivers/atm/atmtcp.c- out_vcc = PRIV(vcc->dev) ? PRIV(vcc->dev)->vcc : NULL; drivers/atm/atmtcp.c- if (!out_vcc) { drivers/atm/atmtcp.c- error = -EUNATCH; drivers/atm/atmtcp.c- break; drivers/atm/atmtcp.c- } set_bit(flag, >flags); wake_up_process(foo); drivers/atm/atmtcp.c- set_current_state(TASK_UNINTERRUPTIBLE); drivers/atm/atmtcp.c: schedule(); drivers/atm/atmtcp.c- } Would not ever wake again it seems.
Re: Linux 3.19-rc5
Dne 30.1.2015 v 02:49 Rafael J. Wysocki napsal(a): On Thursday, January 29, 2015 05:12:11 PM Linus Torvalds wrote: On Thu, Jan 29, 2015 at 5:22 PM, Rafael J. Wysocki wrote: On Wednesday, January 21, 2015 05:54:00 PM Peter Hurley wrote: Yeah, but the debug check is triggering worse behavior, requiring bisecting back to the debug commit. Yes, it is. So I'm wondering is anyone is working on fixing this in any way? It kind of sucks when this is happening on an otherwise perfectly usable old(ish) machine ... The WARN() was already changed to a WARN_ONCE(). So that debug check doesn't cause problems any more. If somebody is bisecting something else, and the WARN() is a problem for those intermediate kernels, then just disabling CONFIG_DEBUG_ATOMIC_SLEEP should get you past that point. IOW, this really shouldn't be an issue. Does the pccard thing still not work? Interestingly enough, if the kernel is built with CONFIG_DEBUG_ATOMIC_SLEEP unset, the problem with 99+% CPU load from pccardd goes away, so thanks for the hint. Ok - I can now 'use' 3.19-rc7 on T61 (C2D) without having a single core occupied on 100% with pccardd, but I still get this one logged: [2.185320] pcmcia_socket pcmcia_socket0: cs: memory probe 0x0c-0x0f: [2.185363] excluding 0xc-0xc 0xe-0xf [2.185526] pcmcia_socket pcmcia_socket0: cs: memory probe 0xa000-0xa0ff: [2.185559] excluding 0xa000-0xa0ff [2.185720] pcmcia_socket pcmcia_socket0: cs: memory probe 0x6000-0x60ff: [2.185754] excluding 0x6000-0x60ff [2.297692] [ cut here ] [2.297698] WARNING: CPU: 1 PID: 185 at kernel/sched/core.c:7300 __might_sleep+0xae/0xc0() [2.297702] do not call blocking ops when !TASK_RUNNING; state=1 set at [] pccardd+0xe8/0x490 [2.297712] Modules linked in: uhci_hcd sr_mod cdrom ehci_pci ehci_hcd yenta_socket usbcore usb_common video backlight autofs4 [2.297715] CPU: 1 PID: 185 Comm: pccardd Not tainted 3.19.0-rc7-2-g9afdf96 #5 [2.297716] Hardware name: LENOVO 6464CTO/6464CTO, BIOS 7LETC9WW (2.29 ) 03/18/2011 [2.297719] 819e4d8a 8800b8bdbc58 8163926d 810a9e1e [2.297721] 8800b8bdbca8 8800b8bdbc98 810587ba 8800 [2.297724] 819e6073 026d 0080 [2.297725] Call Trace: [2.297729] [] dump_stack+0x4f/0x7b [2.297732] [] ? down_trylock+0x2e/0x40 [2.297736] [] warn_slowpath_common+0x8a/0xc0 [2.297738] [] warn_slowpath_fmt+0x46/0x50 [2.297741] [] ? pccardd+0xe8/0x490 [2.297742] [] ? pccardd+0xe8/0x490 [2.297744] [] __might_sleep+0xae/0xc0 [2.297747] [] mutex_lock_nested+0x2e/0x440 [2.297749] [] ? _raw_spin_unlock_irqrestore+0x5d/0x80 [2.297753] [] ? preempt_count_sub+0xab/0x100 [2.297755] [] pccardd+0x150/0x490 [2.297757] [] ? pcmcia_socket_dev_complete+0x40/0x40 [2.297760] [] kthread+0x11f/0x140 [2.297763] [] ? kthread_create_on_node+0x260/0x260 [2.297766] [] ret_from_fork+0x7c/0xb0 [2.297768] [] ? kthread_create_on_node+0x260/0x260 [2.297770] ---[ end trace ed9e591061d223e6 ]--- [2.464635] usb 3-1: new full-speed USB device number 2 using uhci_hcd and of course number of these: [ 29.660708] i915 :00:02.0: fb0: inteldrmfb frame buffer device [ 29.667061] i915 :00:02.0: registered panic notifier [ 30.314206] [ cut here ] [ 30.318866] WARNING: CPU: 0 PID: 1010 at drivers/gpu/drm/drm_irq.c:1121 drm_wait_one_vblank+0x180/0x190 [drm]() [ 30.329085] vblank not available on crtc 1, ret=-22 [ 30.334003] Modules linked in: i915 i2c_algo_bit drm_kms_helper drm xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp tun brid ge stp llc ipv6 ebtables ip6_tables iptable_filter ip_tables x_tables bnep iTCO_wdt hid_generic iTCO_vendor_support snd_hda_codec_analo g snd_hda_codec_generic coretemp kvm_intel kvm microcode usbhid psmouse serio_raw hid arc4 r852 sm_common nand i2c_i801 nand_ecc i2c_co re nand_ids mtd btusb bluetooth iwl3945 sdhci_pci r592 iwlegacy memstick pcmcia snd_hda_intel sdhci mac80211 snd_hda_controller mmc_cor e snd_hda_codec snd_hwdep lpc_ich snd_seq mfd_core snd_seq_device snd_pcm e1000e cfg80211 ptp thinkpad_acpi snd_timer pps_core wmi nvra m [ 30.406592] snd soundcore evdev nfsd auth_rpcgss oid_registry nfs_acl lockd grace binfmt_misc loop sunrpc uhci_hcd sr_mod cdrom ehc i_pci ehci_hcd yenta_socket usbcore usb_common video backlight autofs4 [ 30.424031] CPU: 1 PID: 1010 Comm: Xorg.bin Tainted: GW 3.19.0-rc7-2-g9afdf96 #5 [ 30.432930] Hardware name: LENOVO 6464CTO/6464CTO, BIOS 7LETC9WW (2.29 ) 03/18/2011 [ 30.440627] a0e1017f 880135a37a28 8163926d
Re: Linux 3.19-rc5
On Sat, Jan 31, 2015 at 01:54:36PM -0800, Linus Torvalds wrote: On Sat, Jan 31, 2015 at 12:16 PM, Peter Zijlstra pet...@infradead.org wrote: All the stuff it flagged are genuinely wrong, albeit not disastrously so, things mostly just work. I really disagree. They weren't wrong. They *could* occasionally result in extra reschedules, but that was never wrong to begin with. But the debugging code made that much much worse, and made that could result in extra reschedules happen all the time if it triggered. Yes, that was a bad choice. Again, sorry about that. So you say genuinely wrong, and I say but that's how things were designed - it's an optimistic approach, not an exact one. Your debugging code changed that behavior, and actually introduced a real bug, exactly because you felt that the no nested sleeps was a harder requirement than it has ever actually been. In other words, I think the debugging code itself is wrong, and then that sched_annotate_sleep() thing is just a symptom of how it is wrong. If you have to sprinkle these kinds of random workarounds in a few core scheduler places (ok, mainly wait() paths, it looks like), why would you expect random *drivers* to have to care about things that even core kernel code says I'm not going to care about this, I'll just shut the warning up, because the warning is wrong. I was not aware this pattern was so wide spread / accepted. I have totally underestimated the amount of fallout here. So don't get me wrong - I think the whole add debug code to find places where we might have issues was well worth it, and resulted in improvements. But once the low-hanging fruit and the core code that everybody hits has been fixed, and people cannot apparently even be bothered with the other cases it finds (like the pccardd case), at that point the value of the debug code becomes rather less obvious. My bad (again) for not paying proper attention there. No excuses for that. The pccardd example is an example of legacy use of our old and original semantics of how the whole nested sleep was supposed to work. And it *does* work. It's not a bug. It's how things have worked time immemorial, and clearly nobody is really willing to bother with changing working - but legacy - cardbus code. And at that point, I think the debug code is actually *wrong*, and causes more problems than it fixes. And debug code that causes more problems that it fixes should either be removed, or improved to the point where the problems go away. The improved part might be about saying it's actually perfectly _fine_ to have nested sleeps, as long as it is truly rare that the nested sleep actually sleeps. And thus make the debug code really test that it's *rare*, rather than test that it never happens. Warn if it happens more than a couple of times a second for any particular process, or something like that. So the thing that makes it work is the fact that schedule is called in a loop, without that loop things come apart very quickly. Now most core primitives have this loop, and are safe from spurious wakeups -- and I think we can class this under that. But there's heaps of legacy code that has non looping calls of schedule and really breaks with spurious wakeups (I ran into that a few years ago when I made schedule() return 'randomly' for some other reason). So I worry about separating nesting sleep in loops, which are mostly harmless vs the non-loop case which is really bad. FWIW the bug that started all this was someone calling blocking functions from io_schedule(): io_schedule() - blk_flush_plug() - block-layer-magic - device-magic() - mutex_lock(). And mutex_lock(() used: if (need_resched()) schedule(); for preemption and would never return (because !TASK_RUNNING). Now, we've since fixed the mutex code to not assume TASK_RUNNING, so we won't actually trigger the reported 'deadlock' anymore. In any case, I can certainly make the warning go away for now and try again later with a (hopefully) more intelligent version. On a related note, would it be possible to make sparse/coccinelle try and warn on broken wait primitives? ie, no loop around schedule(), no conditional after set_current_state()? Examples: drivers/atm/atmtcp.c- while (test_bit(flag,vcc-flags) == old_test) { drivers/atm/atmtcp.c- mb(); drivers/atm/atmtcp.c- out_vcc = PRIV(vcc-dev) ? PRIV(vcc-dev)-vcc : NULL; drivers/atm/atmtcp.c- if (!out_vcc) { drivers/atm/atmtcp.c- error = -EUNATCH; drivers/atm/atmtcp.c- break; drivers/atm/atmtcp.c- } set_bit(flag, vcc-flags); wake_up_process(foo); drivers/atm/atmtcp.c- set_current_state(TASK_UNINTERRUPTIBLE); drivers/atm/atmtcp.c: schedule(); drivers/atm/atmtcp.c- } Would not ever wake again it seems. And I'm not entire sure what this is supposed to do,
Re: Linux 3.19-rc5
Dne 30.1.2015 v 02:49 Rafael J. Wysocki napsal(a): On Thursday, January 29, 2015 05:12:11 PM Linus Torvalds wrote: On Thu, Jan 29, 2015 at 5:22 PM, Rafael J. Wysocki r...@rjwysocki.net wrote: On Wednesday, January 21, 2015 05:54:00 PM Peter Hurley wrote: Yeah, but the debug check is triggering worse behavior, requiring bisecting back to the debug commit. Yes, it is. So I'm wondering is anyone is working on fixing this in any way? It kind of sucks when this is happening on an otherwise perfectly usable old(ish) machine ... The WARN() was already changed to a WARN_ONCE(). So that debug check doesn't cause problems any more. If somebody is bisecting something else, and the WARN() is a problem for those intermediate kernels, then just disabling CONFIG_DEBUG_ATOMIC_SLEEP should get you past that point. IOW, this really shouldn't be an issue. Does the pccard thing still not work? Interestingly enough, if the kernel is built with CONFIG_DEBUG_ATOMIC_SLEEP unset, the problem with 99+% CPU load from pccardd goes away, so thanks for the hint. Ok - I can now 'use' 3.19-rc7 on T61 (C2D) without having a single core occupied on 100% with pccardd, but I still get this one logged: [2.185320] pcmcia_socket pcmcia_socket0: cs: memory probe 0x0c-0x0f: [2.185363] excluding 0xc-0xc 0xe-0xf [2.185526] pcmcia_socket pcmcia_socket0: cs: memory probe 0xa000-0xa0ff: [2.185559] excluding 0xa000-0xa0ff [2.185720] pcmcia_socket pcmcia_socket0: cs: memory probe 0x6000-0x60ff: [2.185754] excluding 0x6000-0x60ff [2.297692] [ cut here ] [2.297698] WARNING: CPU: 1 PID: 185 at kernel/sched/core.c:7300 __might_sleep+0xae/0xc0() [2.297702] do not call blocking ops when !TASK_RUNNING; state=1 set at [81518e18] pccardd+0xe8/0x490 [2.297712] Modules linked in: uhci_hcd sr_mod cdrom ehci_pci ehci_hcd yenta_socket usbcore usb_common video backlight autofs4 [2.297715] CPU: 1 PID: 185 Comm: pccardd Not tainted 3.19.0-rc7-2-g9afdf96 #5 [2.297716] Hardware name: LENOVO 6464CTO/6464CTO, BIOS 7LETC9WW (2.29 ) 03/18/2011 [2.297719] 819e4d8a 8800b8bdbc58 8163926d 810a9e1e [2.297721] 8800b8bdbca8 8800b8bdbc98 810587ba 8800 [2.297724] 819e6073 026d 0080 [2.297725] Call Trace: [2.297729] [8163926d] dump_stack+0x4f/0x7b [2.297732] [810a9e1e] ? down_trylock+0x2e/0x40 [2.297736] [810587ba] warn_slowpath_common+0x8a/0xc0 [2.297738] [81058836] warn_slowpath_fmt+0x46/0x50 [2.297741] [81518e18] ? pccardd+0xe8/0x490 [2.297742] [81518e18] ? pccardd+0xe8/0x490 [2.297744] [8108412e] __might_sleep+0xae/0xc0 [2.297747] [8163cd7e] mutex_lock_nested+0x2e/0x440 [2.297749] [8164189d] ? _raw_spin_unlock_irqrestore+0x5d/0x80 [2.297753] [8108b42b] ? preempt_count_sub+0xab/0x100 [2.297755] [81518e80] pccardd+0x150/0x490 [2.297757] [81518d30] ? pcmcia_socket_dev_complete+0x40/0x40 [2.297760] [8107db1f] kthread+0x11f/0x140 [2.297763] [8107da00] ? kthread_create_on_node+0x260/0x260 [2.297766] [8164242c] ret_from_fork+0x7c/0xb0 [2.297768] [8107da00] ? kthread_create_on_node+0x260/0x260 [2.297770] ---[ end trace ed9e591061d223e6 ]--- [2.464635] usb 3-1: new full-speed USB device number 2 using uhci_hcd and of course number of these: [ 29.660708] i915 :00:02.0: fb0: inteldrmfb frame buffer device [ 29.667061] i915 :00:02.0: registered panic notifier [ 30.314206] [ cut here ] [ 30.318866] WARNING: CPU: 0 PID: 1010 at drivers/gpu/drm/drm_irq.c:1121 drm_wait_one_vblank+0x180/0x190 [drm]() [ 30.329085] vblank not available on crtc 1, ret=-22 [ 30.334003] Modules linked in: i915 i2c_algo_bit drm_kms_helper drm xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp tun brid ge stp llc ipv6 ebtables ip6_tables iptable_filter ip_tables x_tables bnep iTCO_wdt hid_generic iTCO_vendor_support snd_hda_codec_analo g snd_hda_codec_generic coretemp kvm_intel kvm microcode usbhid psmouse serio_raw hid arc4 r852 sm_common nand i2c_i801 nand_ecc i2c_co re nand_ids mtd btusb bluetooth iwl3945 sdhci_pci r592 iwlegacy memstick pcmcia snd_hda_intel sdhci mac80211 snd_hda_controller mmc_cor e snd_hda_codec snd_hwdep lpc_ich snd_seq mfd_core snd_seq_device snd_pcm e1000e cfg80211 ptp thinkpad_acpi snd_timer pps_core wmi nvra m [ 30.406592] snd soundcore evdev nfsd auth_rpcgss oid_registry nfs_acl lockd grace binfmt_misc loop sunrpc uhci_hcd sr_mod cdrom ehc i_pci ehci_hcd yenta_socket usbcore
Re: Linux 3.19-rc5
On Sun, Feb 01, 2015 at 12:09:32PM -0800, Linus Torvalds wrote: Now, I have the patch that removes that thing (but I was hoping to get it from the scheduler tree before doing rc7, which seems to not have happened), but yes, that together with your patch seems like it should fix all the nasty bug-inducing crud where the debugging helpers end up silently changing core process state. I'll just combine it with yours to avoid extra noise in this area, and mark you as the author, fixing *both* of the incorrect state changes. Ok? Ah I see it in your tree; I was about to suggest: -# define sched_annotate_sleep()__set_current_state(TASK_RUNNING) +# define sched_annotate_sleep()do { current-task_state_change = 0; } while (0) Instead of the assignment, which has a rvalue. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On 02/01, Linus Torvalds wrote: > > On Sun, Feb 1, 2015 at 11:43 AM, Oleg Nesterov wrote: > > > > And personally I agree. sched_annotate_sleep() looks self-documented, it > > is clear that it is used to suppress the warning. > > But *that's not the problem*. > > If it was just silencing the warning, things would be fine. > > But it is actively screwing task state up, and actually changing > behavior of the kernel (in a very subtle part of the code too), and > doing so in ways that potentially introduce WORSE BUGS THAN THE WHOLE > DAMN DEBUG SUPPORT WAS SUPPOSED TO FIND IN THE FIRST PLACE. I understand, that is why I suggested to change it. > I like your patch, but I'm going to combine it with mine that actually > fixes a real bug, Sure, I agree. > because what you don't see in that patch of yours: > ... > > > is that the whole "if (WARN_ONCE()" remains horribly buggy, because of > the line that follows it: > > __set_current_state(TASK_RUNNING); > > in the if-statement. Ah. I just forgot to mention that this change should be rediffed on top of your patch, of course it is not enough. > I'll just combine it with yours to avoid extra noise in this area, and > mark you as the author, fixing *both* of the incorrect state changes. > Ok? Please combine them, but don't mark me as an author, I do not want to take the false credits ;) Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Sun, Feb 1, 2015 at 11:43 AM, Oleg Nesterov wrote: > > And personally I agree. sched_annotate_sleep() looks self-documented, it > is clear that it is used to suppress the warning. But *that's not the problem*. If it was just silencing the warning, things would be fine. But it is actively screwing task state up, and actually changing behavior of the kernel (in a very subtle part of the code too), and doing so in ways that potentially introduce WORSE BUGS THAN THE WHOLE DAMN DEBUG SUPPORT WAS SUPPOSED TO FIND IN THE FIRST PLACE. That's the thing that upsets me. This is debug code. It's not even debugging things that are "buggy" - its' just finding things that can be potentially slightly inefficient. And it already introduced a subtle bug once. (Ok, it wasn't *that* subtle in the end, but it needed both a good bisection result and some reading of unrelated source code to figure out, so it certainly wasn't really obvious). > Still. Can't we avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP > adds? Now this patch I agree with 100% percent, apart from you calling it a "subtle change". It's more than a "subtle change", this was adding a real and present bug that took two weeks to get fixed! (And by "took two weeks to get fixed" I mean "not actually fixed yet, but now we at least know what was going on"). I like your patch, but I'm going to combine it with mine that actually fixes a real bug, because what you don't see in that patch of yours: > --- x/include/linux/kernel.h > +++ x/include/linux/kernel.h > @@ -176,7 +176,7 @@ extern int _cond_resched(void); > */ > # define might_sleep() \ > do { __might_sleep(__FILE__, __LINE__, 0); might_resched(); } while > (0) > -# define sched_annotate_sleep()__set_current_state(TASK_RUNNING) > +# define sched_annotate_sleep()(current->task_state_change = 0) > #else >static inline void ___might_sleep(const char *file, int line, >int preempt_offset) { } > --- x/kernel/sched/core.c > +++ x/kernel/sched/core.c > @@ -7292,7 +7292,7 @@ void __might_sleep(const char *file, int > * since we will exit with TASK_RUNNING make sure we enter with it, > * otherwise we will destroy state. > */ > - if (WARN_ONCE(current->state != TASK_RUNNING, > + if (WARN_ONCE(current->state != TASK_RUNNING && > current->task_state_change, > "do not call blocking ops when !TASK_RUNNING; " > "state=%lx set at [<%p>] %pS\n", > current->state, is that the whole "if (WARN_ONCE()" remains horribly buggy, because of the line that follows it: __set_current_state(TASK_RUNNING); in the if-statement. Now, I have the patch that removes that thing (but I was hoping to get it from the scheduler tree before doing rc7, which seems to not have happened), but yes, that together with your patch seems like it should fix all the nasty bug-inducing crud where the "debugging helpers" end up silently changing core process state. I'll just combine it with yours to avoid extra noise in this area, and mark you as the author, fixing *both* of the incorrect state changes. Ok? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On 01/31, Peter Zijlstra wrote: > > On Sat, Jan 31, 2015 at 10:32:23AM -0800, Linus Torvalds wrote: > > On Fri, Jan 30, 2015 at 7:47 AM, Oleg Nesterov wrote: > > > > > > Perhaps sched_annotate_sleep() shouldn't depend on > > > CONFIG_DEBUG_ATOMIC_SLEEP > > > too... > > > > Ugh. That thing is horrible. The naming doesn't make it obvious at all > > that it's actually making sure that we have state set to TASK_RUNNING, > > and I could easily imagine that it would cause similar "busy-loops > > while scheduling" issues if anybody ever uses it in the wrong context. > > The alternative was putting unconditional __set_task_state(TASK_RUNNING) > things in a few code paths. I didn't want to cause the extra code in > case we didn't need them. Particularly the include/net/sock.h one, as I > know the network people are cycle counters. And personally I agree. sched_annotate_sleep() looks self-documented, it is clear that it is used to suppress the warning. Still. Can't we avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP adds? Oleg. --- x/include/linux/kernel.h +++ x/include/linux/kernel.h @@ -176,7 +176,7 @@ extern int _cond_resched(void); */ # define might_sleep() \ do { __might_sleep(__FILE__, __LINE__, 0); might_resched(); } while (0) -# define sched_annotate_sleep()__set_current_state(TASK_RUNNING) +# define sched_annotate_sleep()(current->task_state_change = 0) #else static inline void ___might_sleep(const char *file, int line, int preempt_offset) { } --- x/kernel/sched/core.c +++ x/kernel/sched/core.c @@ -7292,7 +7292,7 @@ void __might_sleep(const char *file, int * since we will exit with TASK_RUNNING make sure we enter with it, * otherwise we will destroy state. */ - if (WARN_ONCE(current->state != TASK_RUNNING, + if (WARN_ONCE(current->state != TASK_RUNNING && current->task_state_change, "do not call blocking ops when !TASK_RUNNING; " "state=%lx set at [<%p>] %pS\n", current->state, -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On 01/31, Peter Zijlstra wrote: On Sat, Jan 31, 2015 at 10:32:23AM -0800, Linus Torvalds wrote: On Fri, Jan 30, 2015 at 7:47 AM, Oleg Nesterov o...@redhat.com wrote: Perhaps sched_annotate_sleep() shouldn't depend on CONFIG_DEBUG_ATOMIC_SLEEP too... Ugh. That thing is horrible. The naming doesn't make it obvious at all that it's actually making sure that we have state set to TASK_RUNNING, and I could easily imagine that it would cause similar busy-loops while scheduling issues if anybody ever uses it in the wrong context. The alternative was putting unconditional __set_task_state(TASK_RUNNING) things in a few code paths. I didn't want to cause the extra code in case we didn't need them. Particularly the include/net/sock.h one, as I know the network people are cycle counters. And personally I agree. sched_annotate_sleep() looks self-documented, it is clear that it is used to suppress the warning. Still. Can't we avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP adds? Oleg. --- x/include/linux/kernel.h +++ x/include/linux/kernel.h @@ -176,7 +176,7 @@ extern int _cond_resched(void); */ # define might_sleep() \ do { __might_sleep(__FILE__, __LINE__, 0); might_resched(); } while (0) -# define sched_annotate_sleep()__set_current_state(TASK_RUNNING) +# define sched_annotate_sleep()(current-task_state_change = 0) #else static inline void ___might_sleep(const char *file, int line, int preempt_offset) { } --- x/kernel/sched/core.c +++ x/kernel/sched/core.c @@ -7292,7 +7292,7 @@ void __might_sleep(const char *file, int * since we will exit with TASK_RUNNING make sure we enter with it, * otherwise we will destroy state. */ - if (WARN_ONCE(current-state != TASK_RUNNING, + if (WARN_ONCE(current-state != TASK_RUNNING current-task_state_change, do not call blocking ops when !TASK_RUNNING; state=%lx set at [%p] %pS\n, current-state, -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On 02/01, Linus Torvalds wrote: On Sun, Feb 1, 2015 at 11:43 AM, Oleg Nesterov o...@redhat.com wrote: And personally I agree. sched_annotate_sleep() looks self-documented, it is clear that it is used to suppress the warning. But *that's not the problem*. If it was just silencing the warning, things would be fine. But it is actively screwing task state up, and actually changing behavior of the kernel (in a very subtle part of the code too), and doing so in ways that potentially introduce WORSE BUGS THAN THE WHOLE DAMN DEBUG SUPPORT WAS SUPPOSED TO FIND IN THE FIRST PLACE. I understand, that is why I suggested to change it. I like your patch, but I'm going to combine it with mine that actually fixes a real bug, Sure, I agree. because what you don't see in that patch of yours: ... is that the whole if (WARN_ONCE() remains horribly buggy, because of the line that follows it: __set_current_state(TASK_RUNNING); in the if-statement. Ah. I just forgot to mention that this change should be rediffed on top of your patch, of course it is not enough. I'll just combine it with yours to avoid extra noise in this area, and mark you as the author, fixing *both* of the incorrect state changes. Ok? Please combine them, but don't mark me as an author, I do not want to take the false credits ;) Oleg. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Sun, Feb 1, 2015 at 11:43 AM, Oleg Nesterov o...@redhat.com wrote: And personally I agree. sched_annotate_sleep() looks self-documented, it is clear that it is used to suppress the warning. But *that's not the problem*. If it was just silencing the warning, things would be fine. But it is actively screwing task state up, and actually changing behavior of the kernel (in a very subtle part of the code too), and doing so in ways that potentially introduce WORSE BUGS THAN THE WHOLE DAMN DEBUG SUPPORT WAS SUPPOSED TO FIND IN THE FIRST PLACE. That's the thing that upsets me. This is debug code. It's not even debugging things that are buggy - its' just finding things that can be potentially slightly inefficient. And it already introduced a subtle bug once. (Ok, it wasn't *that* subtle in the end, but it needed both a good bisection result and some reading of unrelated source code to figure out, so it certainly wasn't really obvious). Still. Can't we avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP adds? Now this patch I agree with 100% percent, apart from you calling it a subtle change. It's more than a subtle change, this was adding a real and present bug that took two weeks to get fixed! (And by took two weeks to get fixed I mean not actually fixed yet, but now we at least know what was going on). I like your patch, but I'm going to combine it with mine that actually fixes a real bug, because what you don't see in that patch of yours: --- x/include/linux/kernel.h +++ x/include/linux/kernel.h @@ -176,7 +176,7 @@ extern int _cond_resched(void); */ # define might_sleep() \ do { __might_sleep(__FILE__, __LINE__, 0); might_resched(); } while (0) -# define sched_annotate_sleep()__set_current_state(TASK_RUNNING) +# define sched_annotate_sleep()(current-task_state_change = 0) #else static inline void ___might_sleep(const char *file, int line, int preempt_offset) { } --- x/kernel/sched/core.c +++ x/kernel/sched/core.c @@ -7292,7 +7292,7 @@ void __might_sleep(const char *file, int * since we will exit with TASK_RUNNING make sure we enter with it, * otherwise we will destroy state. */ - if (WARN_ONCE(current-state != TASK_RUNNING, + if (WARN_ONCE(current-state != TASK_RUNNING current-task_state_change, do not call blocking ops when !TASK_RUNNING; state=%lx set at [%p] %pS\n, current-state, is that the whole if (WARN_ONCE() remains horribly buggy, because of the line that follows it: __set_current_state(TASK_RUNNING); in the if-statement. Now, I have the patch that removes that thing (but I was hoping to get it from the scheduler tree before doing rc7, which seems to not have happened), but yes, that together with your patch seems like it should fix all the nasty bug-inducing crud where the debugging helpers end up silently changing core process state. I'll just combine it with yours to avoid extra noise in this area, and mark you as the author, fixing *both* of the incorrect state changes. Ok? Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Sat, Jan 31, 2015 at 12:16 PM, Peter Zijlstra wrote: > > All the stuff it flagged are genuinely wrong, albeit not disastrously > so, things mostly just work. I really disagree. They weren't wrong. They *could* occasionally result in extra reschedules, but that was never wrong to begin with. But the debugging code made that much much worse, and made that "could result in extra reschedules" happen all the time if it triggered. And the whole "set task to sleep early" thing was rather intentional, and was a fundamental part of the original design for "select()/poll()" behavior, for example. Yes, we ended up having the waitqueue functions and using that "pollwake()" and use "wq->triggered" etc instead, but the whole optimistic early sleep thing worked reasonably well for a long time even when the "early TASK_SLEEP" was done before calling *thousands* of random poll routines. So you say "genuinely wrong", and I say "but that's how things were designed - it's an optimistic approach, not an exact one". Your debugging code changed that behavior, and actually introduced a real bug, exactly because you felt that the "no nested sleeps" was a harder requirement than it has ever actually been. In other words, I think the debugging code itself is wrong, and then that sched_annotate_sleep() thing is just a symptom of how it is wrong. If you have to sprinkle these kinds of random workarounds in a few core scheduler places (ok, mainly "wait()" paths, it looks like), why would you expect random *drivers* to have to care about things that even core kernel code says "I'm not going to care about this, I'll just shut the warning up, because the warning is wrong". Yes, the fact that select/poll was changed to try to avoid excessive polling scheduling because it actually *was* a problem under some loads does say that we generally want to try to avoid nested sleeping. Because while it is rare and the optimistic approach works fine in most cases, it certainly *can* become a problem if the optimistic "I'm normally not going to sleep" thing ends up not being sufficiently accurate. So don't get me wrong - I think the whole "add debug code to find places where we might have issues" was well worth it, and resulted in improvements. But once the low-hanging fruit and the core code that everybody hits has been fixed, and people cannot apparently even be bothered with the other cases it finds (like the pccardd case), at that point the value of the debug code becomes rather less obvious. And the downsides become bigger. The pccardd example is an example of legacy use of our old and original semantics of how the whole nested sleep was supposed to work. And it *does* work. It's not a bug. It's how things have worked time immemorial, and clearly nobody is really willing to bother with changing working - but legacy - cardbus code. And at that point, I think the debug code is actually *wrong*, and causes more problems than it "fixes". And debug code that causes more problems that it fixes should either be removed, or improved to the point where the problems go away. The "improved" part might be about saying "it's actually perfectly _fine_ to have nested sleeps, as long as it is truly rare that the nested sleep actually sleeps". And thus make the debug code really test that it's *rare*, rather than test that it never happens. Warn if it happens more than a couple of times a second for any particular process, or something like that. Hmm? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Sat, Jan 31, 2015 at 10:32:23AM -0800, Linus Torvalds wrote: > On Fri, Jan 30, 2015 at 7:47 AM, Oleg Nesterov wrote: > > > > Perhaps sched_annotate_sleep() shouldn't depend on CONFIG_DEBUG_ATOMIC_SLEEP > > too... > > Ugh. That thing is horrible. The naming doesn't make it obvious at all > that it's actually making sure that we have state set to TASK_RUNNING, > and I could easily imagine that it would cause similar "busy-loops > while scheduling" issues if anybody ever uses it in the wrong context. The alternative was putting unconditional __set_task_state(TASK_RUNNING) things in a few code paths. I didn't want to cause the extra code in case we didn't need them. Particularly the include/net/sock.h one, as I know the network people are cycle counters. But this function should indeed be used very rarely. Currently we're at 5 instances. > So I really think that whole thing is a sign of "the debug > infrastructure is buggy, and people are introducing fragile things to > just shut up the false positives". Aside from this one annotation, which is used in cases where we broke out of the wait loop but haven't reset task state yet, there have not really been false positives. All the stuff it flagged are genuinely wrong, albeit not disastrously so, things mostly just work. Should I make the default wait_event() safe against sleeps? It would make it slightly more expensive, but on the whole that should not really be a problem. Alternatively we should provide an alternative to wait_event() that allows sleeps, but I'm failing to come up with a good name. > I don't know how to fix it. I really get the feeling that the whole > new "nested sleep" detection code was a mistake to begin with, since > it wasn't even a real bug, and it has now created more bugs than it > ever detected afaik. I appreciate I caused you some pain here, and I'm very sorry for that. But I do think we should be little more careful with task::state, on occasion things do go wrong there and funny stuff happens. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Fri, Jan 30, 2015 at 7:47 AM, Oleg Nesterov wrote: > > Perhaps sched_annotate_sleep() shouldn't depend on CONFIG_DEBUG_ATOMIC_SLEEP > too... Ugh. That thing is horrible. The naming doesn't make it obvious at all that it's actually making sure that we have state set to TASK_RUNNING, and I could easily imagine that it would cause similar "busy-loops while scheduling" issues if anybody ever uses it in the wrong context. So I really think that whole thing is a sign of "the debug infrastructure is buggy, and people are introducing fragile things to just shut up the false positives". I don't know how to fix it. I really get the feeling that the whole new "nested sleep" detection code was a mistake to begin with, since it wasn't even a real bug, and it has now created more bugs than it ever detected afaik. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Thu, Jan 29, 2015 at 05:25:07PM -0800, Linus Torvalds wrote: > PeterZ, please don't make "debugging" patches like this. Ever again. > Because this was just stupid, and it took me too long to realize that > despite the warning being shut up, the debug patch was still actively > doing bad bad things. Understood. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Thu, 29 Jan 2015 17:25:07 Linus Torvalds wrote: > On Thu, Jan 29, 2015 at 5:12 PM, Linus Torvalds wrote: > > > > The WARN() was already changed to a WARN_ONCE(). > > Oh, but I notice that the "__set_current_state(TASK_RUNNING) ends up > always happening. > > So I think the right fix is to: > > - warn once like we do > > - but *not* do that __set_current_state() which was always total > crap anyway > > Why do I say "total crap"? Because of two independent issues: > > (a) it actually changes behavior for a debug vs non-debug kernel, > which is a really bad idea to begin with > > (b) it's really wrong. The whole "nested sleep" case was never a > major bug to begin with, just a possible inefficiency where constant > nested sleeps would possibly make the outer sleep not sleep. But that > "could possibly make" case was the unlikely case, and the debug patch > made it happen *all* the time by explicitly setting things running. > > So I think the proper patch is the attached. > > The comment is also crap. The comment says > > "Blocking primitives will set (and therefore destroy) > current->state [...]" > > but the reality is that they *may* set it, and only in the unlikely > slow-path where they actually block. > > So doing this in "__may_sleep()" is just bogus and horrible horrible > crap. It turns the "harmless ugliness" into a real *harmful* bug. The > key word of "__may_sleep()" is that "MAY" part. It's a debug thing to > make relatively rare cases show up. > > PeterZ, please don't make "debugging" patches like this. Ever again. > Because this was just stupid, and it took me too long to realize that > despite the warning being shut up, the debug patch was still actively > doing bad bad things. > > Ingo, maybe you'd want to apply this through the scheduler tree, the > way you already did the WARN_ONCE() thing. > > Bruno, does this finally actually fix your pccard thing? I will report back on Wednesday when I'm back home from FOSDEM. I don't have the affected machine at hand at the moment. Thanks for looking into it! Bruno > Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Thu, 29 Jan 2015 17:25:07 Linus Torvalds wrote: On Thu, Jan 29, 2015 at 5:12 PM, Linus Torvalds wrote: The WARN() was already changed to a WARN_ONCE(). Oh, but I notice that the __set_current_state(TASK_RUNNING) ends up always happening. So I think the right fix is to: - warn once like we do - but *not* do that __set_current_state() which was always total crap anyway Why do I say total crap? Because of two independent issues: (a) it actually changes behavior for a debug vs non-debug kernel, which is a really bad idea to begin with (b) it's really wrong. The whole nested sleep case was never a major bug to begin with, just a possible inefficiency where constant nested sleeps would possibly make the outer sleep not sleep. But that could possibly make case was the unlikely case, and the debug patch made it happen *all* the time by explicitly setting things running. So I think the proper patch is the attached. The comment is also crap. The comment says Blocking primitives will set (and therefore destroy) current-state [...] but the reality is that they *may* set it, and only in the unlikely slow-path where they actually block. So doing this in __may_sleep() is just bogus and horrible horrible crap. It turns the harmless ugliness into a real *harmful* bug. The key word of __may_sleep() is that MAY part. It's a debug thing to make relatively rare cases show up. PeterZ, please don't make debugging patches like this. Ever again. Because this was just stupid, and it took me too long to realize that despite the warning being shut up, the debug patch was still actively doing bad bad things. Ingo, maybe you'd want to apply this through the scheduler tree, the way you already did the WARN_ONCE() thing. Bruno, does this finally actually fix your pccard thing? I will report back on Wednesday when I'm back home from FOSDEM. I don't have the affected machine at hand at the moment. Thanks for looking into it! Bruno Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Fri, Jan 30, 2015 at 7:47 AM, Oleg Nesterov o...@redhat.com wrote: Perhaps sched_annotate_sleep() shouldn't depend on CONFIG_DEBUG_ATOMIC_SLEEP too... Ugh. That thing is horrible. The naming doesn't make it obvious at all that it's actually making sure that we have state set to TASK_RUNNING, and I could easily imagine that it would cause similar busy-loops while scheduling issues if anybody ever uses it in the wrong context. So I really think that whole thing is a sign of the debug infrastructure is buggy, and people are introducing fragile things to just shut up the false positives. I don't know how to fix it. I really get the feeling that the whole new nested sleep detection code was a mistake to begin with, since it wasn't even a real bug, and it has now created more bugs than it ever detected afaik. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Sat, Jan 31, 2015 at 10:32:23AM -0800, Linus Torvalds wrote: On Fri, Jan 30, 2015 at 7:47 AM, Oleg Nesterov o...@redhat.com wrote: Perhaps sched_annotate_sleep() shouldn't depend on CONFIG_DEBUG_ATOMIC_SLEEP too... Ugh. That thing is horrible. The naming doesn't make it obvious at all that it's actually making sure that we have state set to TASK_RUNNING, and I could easily imagine that it would cause similar busy-loops while scheduling issues if anybody ever uses it in the wrong context. The alternative was putting unconditional __set_task_state(TASK_RUNNING) things in a few code paths. I didn't want to cause the extra code in case we didn't need them. Particularly the include/net/sock.h one, as I know the network people are cycle counters. But this function should indeed be used very rarely. Currently we're at 5 instances. So I really think that whole thing is a sign of the debug infrastructure is buggy, and people are introducing fragile things to just shut up the false positives. Aside from this one annotation, which is used in cases where we broke out of the wait loop but haven't reset task state yet, there have not really been false positives. All the stuff it flagged are genuinely wrong, albeit not disastrously so, things mostly just work. Should I make the default wait_event() safe against sleeps? It would make it slightly more expensive, but on the whole that should not really be a problem. Alternatively we should provide an alternative to wait_event() that allows sleeps, but I'm failing to come up with a good name. I don't know how to fix it. I really get the feeling that the whole new nested sleep detection code was a mistake to begin with, since it wasn't even a real bug, and it has now created more bugs than it ever detected afaik. I appreciate I caused you some pain here, and I'm very sorry for that. But I do think we should be little more careful with task::state, on occasion things do go wrong there and funny stuff happens. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Sat, Jan 31, 2015 at 12:16 PM, Peter Zijlstra pet...@infradead.org wrote: All the stuff it flagged are genuinely wrong, albeit not disastrously so, things mostly just work. I really disagree. They weren't wrong. They *could* occasionally result in extra reschedules, but that was never wrong to begin with. But the debugging code made that much much worse, and made that could result in extra reschedules happen all the time if it triggered. And the whole set task to sleep early thing was rather intentional, and was a fundamental part of the original design for select()/poll() behavior, for example. Yes, we ended up having the waitqueue functions and using that pollwake() and use wq-triggered etc instead, but the whole optimistic early sleep thing worked reasonably well for a long time even when the early TASK_SLEEP was done before calling *thousands* of random poll routines. So you say genuinely wrong, and I say but that's how things were designed - it's an optimistic approach, not an exact one. Your debugging code changed that behavior, and actually introduced a real bug, exactly because you felt that the no nested sleeps was a harder requirement than it has ever actually been. In other words, I think the debugging code itself is wrong, and then that sched_annotate_sleep() thing is just a symptom of how it is wrong. If you have to sprinkle these kinds of random workarounds in a few core scheduler places (ok, mainly wait() paths, it looks like), why would you expect random *drivers* to have to care about things that even core kernel code says I'm not going to care about this, I'll just shut the warning up, because the warning is wrong. Yes, the fact that select/poll was changed to try to avoid excessive polling scheduling because it actually *was* a problem under some loads does say that we generally want to try to avoid nested sleeping. Because while it is rare and the optimistic approach works fine in most cases, it certainly *can* become a problem if the optimistic I'm normally not going to sleep thing ends up not being sufficiently accurate. So don't get me wrong - I think the whole add debug code to find places where we might have issues was well worth it, and resulted in improvements. But once the low-hanging fruit and the core code that everybody hits has been fixed, and people cannot apparently even be bothered with the other cases it finds (like the pccardd case), at that point the value of the debug code becomes rather less obvious. And the downsides become bigger. The pccardd example is an example of legacy use of our old and original semantics of how the whole nested sleep was supposed to work. And it *does* work. It's not a bug. It's how things have worked time immemorial, and clearly nobody is really willing to bother with changing working - but legacy - cardbus code. And at that point, I think the debug code is actually *wrong*, and causes more problems than it fixes. And debug code that causes more problems that it fixes should either be removed, or improved to the point where the problems go away. The improved part might be about saying it's actually perfectly _fine_ to have nested sleeps, as long as it is truly rare that the nested sleep actually sleeps. And thus make the debug code really test that it's *rare*, rather than test that it never happens. Warn if it happens more than a couple of times a second for any particular process, or something like that. Hmm? Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Thu, Jan 29, 2015 at 05:25:07PM -0800, Linus Torvalds wrote: PeterZ, please don't make debugging patches like this. Ever again. Because this was just stupid, and it took me too long to realize that despite the warning being shut up, the debug patch was still actively doing bad bad things. Understood. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On 01/29, Linus Torvalds wrote: > > (a) it actually changes behavior for a debug vs non-debug kernel, > which is a really bad idea to begin with Perhaps sched_annotate_sleep() shouldn't depend on CONFIG_DEBUG_ATOMIC_SLEEP too... Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On 01/29, Linus Torvalds wrote: (a) it actually changes behavior for a debug vs non-debug kernel, which is a really bad idea to begin with Perhaps sched_annotate_sleep() shouldn't depend on CONFIG_DEBUG_ATOMIC_SLEEP too... Oleg. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Thursday, January 29, 2015 05:25:07 PM Linus Torvalds wrote: > On Thu, Jan 29, 2015 at 5:12 PM, Linus Torvalds > wrote: > > > > The WARN() was already changed to a WARN_ONCE(). > > Oh, but I notice that the "__set_current_state(TASK_RUNNING) ends up > always happening. > > So I think the right fix is to: > > - warn once like we do > > - but *not* do that __set_current_state() which was always total crap anyway > > Why do I say "total crap"? Because of two independent issues: > > (a) it actually changes behavior for a debug vs non-debug kernel, > which is a really bad idea to begin with > > (b) it's really wrong. The whole "nested sleep" case was never a > major bug to begin with, just a possible inefficiency where constant > nested sleeps would possibly make the outer sleep not sleep. But that > "could possibly make" case was the unlikely case, and the debug patch > made it happen *all* the time by explicitly setting things running. > > So I think the proper patch is the attached. I applied the patch and compiled the kernel with CONFIG_DEBUG_ATOMIC_SLEEP set again. The warning is back, so the DEBUG_ATOMIC_SLEEP thing appears to be working (and I don't really care about the freaking warning anyway), but the pccardd load issue is gone, so the patch fixes the problem for me. Thanks! Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Thu, Jan 29, 2015 at 5:25 PM, Linus Torvalds wrote: > > Ingo, maybe you'd want to apply this through the scheduler tree, the > way you already did the WARN_ONCE() thing. Side note: I can obviously just commit it myself, but for things that have obvious maintainers I tend to try to try to push the changes through channels. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Thursday, January 29, 2015 05:12:11 PM Linus Torvalds wrote: > On Thu, Jan 29, 2015 at 5:22 PM, Rafael J. Wysocki wrote: > > On Wednesday, January 21, 2015 05:54:00 PM Peter Hurley wrote: > >> > >> Yeah, but the debug check is triggering worse behavior, requiring > >> bisecting back to the debug commit. > > > > Yes, it is. > > > > So I'm wondering is anyone is working on fixing this in any way? > > > > It kind of sucks when this is happening on an otherwise perfectly usable > > old(ish) machine ... > > The WARN() was already changed to a WARN_ONCE(). > > So that debug check doesn't cause problems any more. If somebody is > bisecting something else, and the WARN() is a problem for those > intermediate kernels, then just disabling CONFIG_DEBUG_ATOMIC_SLEEP > should get you past that point. > > IOW, this really shouldn't be an issue. > > Does the pccard thing still not work? Interestingly enough, if the kernel is built with CONFIG_DEBUG_ATOMIC_SLEEP unset, the problem with 99+% CPU load from pccardd goes away, so thanks for the hint. Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Thu, Jan 29, 2015 at 5:12 PM, Linus Torvalds wrote: > > The WARN() was already changed to a WARN_ONCE(). Oh, but I notice that the "__set_current_state(TASK_RUNNING) ends up always happening. So I think the right fix is to: - warn once like we do - but *not* do that __set_current_state() which was always total crap anyway Why do I say "total crap"? Because of two independent issues: (a) it actually changes behavior for a debug vs non-debug kernel, which is a really bad idea to begin with (b) it's really wrong. The whole "nested sleep" case was never a major bug to begin with, just a possible inefficiency where constant nested sleeps would possibly make the outer sleep not sleep. But that "could possibly make" case was the unlikely case, and the debug patch made it happen *all* the time by explicitly setting things running. So I think the proper patch is the attached. The comment is also crap. The comment says "Blocking primitives will set (and therefore destroy) current->state [...]" but the reality is that they *may* set it, and only in the unlikely slow-path where they actually block. So doing this in "__may_sleep()" is just bogus and horrible horrible crap. It turns the "harmless ugliness" into a real *harmful* bug. The key word of "__may_sleep()" is that "MAY" part. It's a debug thing to make relatively rare cases show up. PeterZ, please don't make "debugging" patches like this. Ever again. Because this was just stupid, and it took me too long to realize that despite the warning being shut up, the debug patch was still actively doing bad bad things. Ingo, maybe you'd want to apply this through the scheduler tree, the way you already did the WARN_ONCE() thing. Bruno, does this finally actually fix your pccard thing? Linus kernel/sched/core.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index c0accc00566e..76aaf5c61114 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7292,13 +7292,12 @@ void __might_sleep(const char *file, int line, int preempt_offset) * since we will exit with TASK_RUNNING make sure we enter with it, * otherwise we will destroy state. */ - if (WARN_ONCE(current->state != TASK_RUNNING, + WARN_ONCE(current->state != TASK_RUNNING, "do not call blocking ops when !TASK_RUNNING; " "state=%lx set at [<%p>] %pS\n", current->state, (void *)current->task_state_change, - (void *)current->task_state_change)) - __set_current_state(TASK_RUNNING); + (void *)current->task_state_change); ___might_sleep(file, line, preempt_offset); }
Re: Linux 3.19-rc5
On Thu, Jan 29, 2015 at 5:22 PM, Rafael J. Wysocki wrote: > On Wednesday, January 21, 2015 05:54:00 PM Peter Hurley wrote: >> >> Yeah, but the debug check is triggering worse behavior, requiring >> bisecting back to the debug commit. > > Yes, it is. > > So I'm wondering is anyone is working on fixing this in any way? > > It kind of sucks when this is happening on an otherwise perfectly usable > old(ish) machine ... The WARN() was already changed to a WARN_ONCE(). So that debug check doesn't cause problems any more. If somebody is bisecting something else, and the WARN() is a problem for those intermediate kernels, then just disabling CONFIG_DEBUG_ATOMIC_SLEEP should get you past that point. IOW, this really shouldn't be an issue. Does the pccard thing still not work? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Wednesday, January 21, 2015 05:54:00 PM Peter Hurley wrote: > On 01/21/2015 05:12 PM, Davidlohr Bueso wrote: > > On Wed, 2015-01-21 at 22:37 +0100, Bruno Prémont wrote: > >> On Wed, 21 January 2015 Bruno Prémont wrote: > >>> On Tue, 20 January 2015 Linus Torvalds wrote: > On Tue, Jan 20, 2015 at 6:02 AM, Bruno Prémont wrote: > > > > No idea yet which rc is the offender (nor exact patch), but on my not > > so recent UP laptop with a pccard slot I have 2 pccardd kernel threads > > converting my laptop into a heater. > > [...] > > >> Bisecting to the end did point me at (the warning traces produced in great > >> quantities might not be the very same issue as the abusive CPU usage, but > >> certainly look very related): > >> [CCing people on CC for the patch] > >> > >> commit 8eb23b9f35aae413140d3fda766a98092c21e9b0 > >> Author: Peter Zijlstra > >> Date: Wed Sep 24 10:18:55 2014 +0200 > > [...] > > >> Which does produce the following trace (hand-copied most important parts > >> of it): > >> Warning: CPU 0 PID: 68 at kernel/sched/core.c:7311 > >> __might_sleep+0x143/0x170 > >> do not call blocking ops when !TASK_RUNNING; state=1 set at [] > >> pccardd+0xa0/0x3e0 > >> ... > >> Call trace: > >> ... > >> __might_sleep+0x143/0x170 > >> ? pccardd+0xa0/0x3e0 > >> ? pccardd+0xa0/0x3e0 > >> mutex_lock+0x17/0x2a > >> pccardd+0xe9/0x3e0 > >> ? pcmcia_socket_uevent+0x30/0x30 > >> > >> pccardd() is located in drivers/pcmcia/cs.c and seems to be of the > >> structure > >> Peter's patch wants to warn about. > > > > Yeah setting current to interruptable so early in the game is bogus. It > > should be set after unlocking the skt_mutex. > > Yeah, but the debug check is triggering worse behavior, requiring > bisecting back to the debug commit. Yes, it is. So I'm wondering is anyone is working on fixing this in any way? It kind of sucks when this is happening on an otherwise perfectly usable old(ish) machine ... Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Thu, Jan 29, 2015 at 5:12 PM, Linus Torvalds torva...@linux-foundation.org wrote: The WARN() was already changed to a WARN_ONCE(). Oh, but I notice that the __set_current_state(TASK_RUNNING) ends up always happening. So I think the right fix is to: - warn once like we do - but *not* do that __set_current_state() which was always total crap anyway Why do I say total crap? Because of two independent issues: (a) it actually changes behavior for a debug vs non-debug kernel, which is a really bad idea to begin with (b) it's really wrong. The whole nested sleep case was never a major bug to begin with, just a possible inefficiency where constant nested sleeps would possibly make the outer sleep not sleep. But that could possibly make case was the unlikely case, and the debug patch made it happen *all* the time by explicitly setting things running. So I think the proper patch is the attached. The comment is also crap. The comment says Blocking primitives will set (and therefore destroy) current-state [...] but the reality is that they *may* set it, and only in the unlikely slow-path where they actually block. So doing this in __may_sleep() is just bogus and horrible horrible crap. It turns the harmless ugliness into a real *harmful* bug. The key word of __may_sleep() is that MAY part. It's a debug thing to make relatively rare cases show up. PeterZ, please don't make debugging patches like this. Ever again. Because this was just stupid, and it took me too long to realize that despite the warning being shut up, the debug patch was still actively doing bad bad things. Ingo, maybe you'd want to apply this through the scheduler tree, the way you already did the WARN_ONCE() thing. Bruno, does this finally actually fix your pccard thing? Linus kernel/sched/core.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index c0accc00566e..76aaf5c61114 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7292,13 +7292,12 @@ void __might_sleep(const char *file, int line, int preempt_offset) * since we will exit with TASK_RUNNING make sure we enter with it, * otherwise we will destroy state. */ - if (WARN_ONCE(current-state != TASK_RUNNING, + WARN_ONCE(current-state != TASK_RUNNING, do not call blocking ops when !TASK_RUNNING; state=%lx set at [%p] %pS\n, current-state, (void *)current-task_state_change, - (void *)current-task_state_change)) - __set_current_state(TASK_RUNNING); + (void *)current-task_state_change); ___might_sleep(file, line, preempt_offset); }
Re: Linux 3.19-rc5
On Thu, Jan 29, 2015 at 5:22 PM, Rafael J. Wysocki r...@rjwysocki.net wrote: On Wednesday, January 21, 2015 05:54:00 PM Peter Hurley wrote: Yeah, but the debug check is triggering worse behavior, requiring bisecting back to the debug commit. Yes, it is. So I'm wondering is anyone is working on fixing this in any way? It kind of sucks when this is happening on an otherwise perfectly usable old(ish) machine ... The WARN() was already changed to a WARN_ONCE(). So that debug check doesn't cause problems any more. If somebody is bisecting something else, and the WARN() is a problem for those intermediate kernels, then just disabling CONFIG_DEBUG_ATOMIC_SLEEP should get you past that point. IOW, this really shouldn't be an issue. Does the pccard thing still not work? Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Thursday, January 29, 2015 05:12:11 PM Linus Torvalds wrote: On Thu, Jan 29, 2015 at 5:22 PM, Rafael J. Wysocki r...@rjwysocki.net wrote: On Wednesday, January 21, 2015 05:54:00 PM Peter Hurley wrote: Yeah, but the debug check is triggering worse behavior, requiring bisecting back to the debug commit. Yes, it is. So I'm wondering is anyone is working on fixing this in any way? It kind of sucks when this is happening on an otherwise perfectly usable old(ish) machine ... The WARN() was already changed to a WARN_ONCE(). So that debug check doesn't cause problems any more. If somebody is bisecting something else, and the WARN() is a problem for those intermediate kernels, then just disabling CONFIG_DEBUG_ATOMIC_SLEEP should get you past that point. IOW, this really shouldn't be an issue. Does the pccard thing still not work? Interestingly enough, if the kernel is built with CONFIG_DEBUG_ATOMIC_SLEEP unset, the problem with 99+% CPU load from pccardd goes away, so thanks for the hint. Rafael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Thu, Jan 29, 2015 at 5:25 PM, Linus Torvalds torva...@linux-foundation.org wrote: Ingo, maybe you'd want to apply this through the scheduler tree, the way you already did the WARN_ONCE() thing. Side note: I can obviously just commit it myself, but for things that have obvious maintainers I tend to try to try to push the changes through channels. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Wednesday, January 21, 2015 05:54:00 PM Peter Hurley wrote: On 01/21/2015 05:12 PM, Davidlohr Bueso wrote: On Wed, 2015-01-21 at 22:37 +0100, Bruno Prémont wrote: On Wed, 21 January 2015 Bruno Prémont wrote: On Tue, 20 January 2015 Linus Torvalds wrote: On Tue, Jan 20, 2015 at 6:02 AM, Bruno Prémont wrote: No idea yet which rc is the offender (nor exact patch), but on my not so recent UP laptop with a pccard slot I have 2 pccardd kernel threads converting my laptop into a heater. [...] Bisecting to the end did point me at (the warning traces produced in great quantities might not be the very same issue as the abusive CPU usage, but certainly look very related): [CCing people on CC for the patch] commit 8eb23b9f35aae413140d3fda766a98092c21e9b0 Author: Peter Zijlstra pet...@infradead.org Date: Wed Sep 24 10:18:55 2014 +0200 [...] Which does produce the following trace (hand-copied most important parts of it): Warning: CPU 0 PID: 68 at kernel/sched/core.c:7311 __might_sleep+0x143/0x170 do not call blocking ops when !TASK_RUNNING; state=1 set at [c1436390] pccardd+0xa0/0x3e0 ... Call trace: ... __might_sleep+0x143/0x170 ? pccardd+0xa0/0x3e0 ? pccardd+0xa0/0x3e0 mutex_lock+0x17/0x2a pccardd+0xe9/0x3e0 ? pcmcia_socket_uevent+0x30/0x30 pccardd() is located in drivers/pcmcia/cs.c and seems to be of the structure Peter's patch wants to warn about. Yeah setting current to interruptable so early in the game is bogus. It should be set after unlocking the skt_mutex. Yeah, but the debug check is triggering worse behavior, requiring bisecting back to the debug commit. Yes, it is. So I'm wondering is anyone is working on fixing this in any way? It kind of sucks when this is happening on an otherwise perfectly usable old(ish) machine ... Rafael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Thursday, January 29, 2015 05:25:07 PM Linus Torvalds wrote: On Thu, Jan 29, 2015 at 5:12 PM, Linus Torvalds torva...@linux-foundation.org wrote: The WARN() was already changed to a WARN_ONCE(). Oh, but I notice that the __set_current_state(TASK_RUNNING) ends up always happening. So I think the right fix is to: - warn once like we do - but *not* do that __set_current_state() which was always total crap anyway Why do I say total crap? Because of two independent issues: (a) it actually changes behavior for a debug vs non-debug kernel, which is a really bad idea to begin with (b) it's really wrong. The whole nested sleep case was never a major bug to begin with, just a possible inefficiency where constant nested sleeps would possibly make the outer sleep not sleep. But that could possibly make case was the unlikely case, and the debug patch made it happen *all* the time by explicitly setting things running. So I think the proper patch is the attached. I applied the patch and compiled the kernel with CONFIG_DEBUG_ATOMIC_SLEEP set again. The warning is back, so the DEBUG_ATOMIC_SLEEP thing appears to be working (and I don't really care about the freaking warning anyway), but the pccardd load issue is gone, so the patch fixes the problem for me. Thanks! Rafael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On 01/21/2015 05:12 PM, Davidlohr Bueso wrote: > On Wed, 2015-01-21 at 22:37 +0100, Bruno Prémont wrote: >> On Wed, 21 January 2015 Bruno Prémont wrote: >>> On Tue, 20 January 2015 Linus Torvalds wrote: On Tue, Jan 20, 2015 at 6:02 AM, Bruno Prémont wrote: > > No idea yet which rc is the offender (nor exact patch), but on my not > so recent UP laptop with a pccard slot I have 2 pccardd kernel threads > converting my laptop into a heater. [...] >> Bisecting to the end did point me at (the warning traces produced in great >> quantities might not be the very same issue as the abusive CPU usage, but >> certainly look very related): >> [CCing people on CC for the patch] >> >> commit 8eb23b9f35aae413140d3fda766a98092c21e9b0 >> Author: Peter Zijlstra >> Date: Wed Sep 24 10:18:55 2014 +0200 [...] >> Which does produce the following trace (hand-copied most important parts of >> it): >> Warning: CPU 0 PID: 68 at kernel/sched/core.c:7311 >> __might_sleep+0x143/0x170 >> do not call blocking ops when !TASK_RUNNING; state=1 set at [] >> pccardd+0xa0/0x3e0 >> ... >> Call trace: >> ... >> __might_sleep+0x143/0x170 >> ? pccardd+0xa0/0x3e0 >> ? pccardd+0xa0/0x3e0 >> mutex_lock+0x17/0x2a >> pccardd+0xe9/0x3e0 >> ? pcmcia_socket_uevent+0x30/0x30 >> >> pccardd() is located in drivers/pcmcia/cs.c and seems to be of the structure >> Peter's patch wants to warn about. > > Yeah setting current to interruptable so early in the game is bogus. It > should be set after unlocking the skt_mutex. Yeah, but the debug check is triggering worse behavior, requiring bisecting back to the debug commit. How does the might_sleep() check here guarantee the task won't sleep? Regards, Peter Hurley -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Wed, 2015-01-21 at 22:37 +0100, Bruno Prémont wrote: > On Wed, 21 January 2015 Bruno Prémont wrote: > > On Tue, 20 January 2015 Linus Torvalds wrote: > > > On Tue, Jan 20, 2015 at 6:02 AM, Bruno Prémont wrote: > > > > > > > > No idea yet which rc is the offender (nor exact patch), but on my not > > > > so recent UP laptop with a pccard slot I have 2 pccardd kernel threads > > > > converting my laptop into a heater. > > > > > > > > lspci for affected nodes: > > > > 02:06.0 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus > > > > Controller [1217:7113] (rev 20) > > > > 02:06.1 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus > > > > Controller [1217:7113] (rev 20) > > > > > > > > Very basics I have, before I attempt any bisection: > > > > > > Hmm. I'm not seeing anything recent changing anything in this area, so > > > I suspect that unless somebody else steps up and says "Ahh, that > > > sounds like xyz", your bisection is the best option. > > Bisecting to the end did point me at (the warning traces produced in great > quantities might not be the very same issue as the abusive CPU usage, but > certainly look very related): > [CCing people on CC for the patch] > > commit 8eb23b9f35aae413140d3fda766a98092c21e9b0 > Author: Peter Zijlstra > Date: Wed Sep 24 10:18:55 2014 +0200 > > sched: Debug nested sleeps > > Validate we call might_sleep() with TASK_RUNNING, which catches places > where we nest blocking primitives, eg. mutex usage in a wait loop. > > Since all blocking is arranged through task_struct::state, nesting > this will cause the inner primitive to set TASK_RUNNING and the outer > will thus not block. > > Another observed problem is calling a blocking function from > schedule()->sched_submit_work()->blk_schedule_flush_plug() which will > then destroy the task state for the actual __schedule() call that > comes after it. > > Signed-off-by: Peter Zijlstra (Intel) > Cc: t...@linutronix.de > Cc: ilya.dryo...@inktank.com > Cc: umgwanakikb...@gmail.com > Cc: o...@redhat.com > Cc: Linus Torvalds > Link: http://lkml.kernel.org/r/20140924082242.591637...@infradead.org > Signed-off-by: Ingo Molnar > > Which does produce the following trace (hand-copied most important parts of > it): > Warning: CPU 0 PID: 68 at kernel/sched/core.c:7311 __might_sleep+0x143/0x170 > do not call blocking ops when !TASK_RUNNING; state=1 set at [] > pccardd+0xa0/0x3e0 > ... > Call trace: > ... > __might_sleep+0x143/0x170 > ? pccardd+0xa0/0x3e0 > ? pccardd+0xa0/0x3e0 > mutex_lock+0x17/0x2a > pccardd+0xe9/0x3e0 > ? pcmcia_socket_uevent+0x30/0x30 > > pccardd() is located in drivers/pcmcia/cs.c and seems to be of the structure > Peter's patch wants to warn about. Yeah setting current to interruptable so early in the game is bogus. It should be set after unlocking the skt_mutex. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Wed, 21 January 2015 Bruno Prémont wrote: > On Tue, 20 January 2015 Linus Torvalds wrote: > > On Tue, Jan 20, 2015 at 6:02 AM, Bruno Prémont wrote: > > > > > > No idea yet which rc is the offender (nor exact patch), but on my not > > > so recent UP laptop with a pccard slot I have 2 pccardd kernel threads > > > converting my laptop into a heater. > > > > > > lspci for affected nodes: > > > 02:06.0 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus > > > Controller [1217:7113] (rev 20) > > > 02:06.1 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus > > > Controller [1217:7113] (rev 20) > > > > > > Very basics I have, before I attempt any bisection: > > > > Hmm. I'm not seeing anything recent changing anything in this area, so > > I suspect that unless somebody else steps up and says "Ahh, that > > sounds like xyz", your bisection is the best option. Bisecting to the end did point me at (the warning traces produced in great quantities might not be the very same issue as the abusive CPU usage, but certainly look very related): [CCing people on CC for the patch] commit 8eb23b9f35aae413140d3fda766a98092c21e9b0 Author: Peter Zijlstra Date: Wed Sep 24 10:18:55 2014 +0200 sched: Debug nested sleeps Validate we call might_sleep() with TASK_RUNNING, which catches places where we nest blocking primitives, eg. mutex usage in a wait loop. Since all blocking is arranged through task_struct::state, nesting this will cause the inner primitive to set TASK_RUNNING and the outer will thus not block. Another observed problem is calling a blocking function from schedule()->sched_submit_work()->blk_schedule_flush_plug() which will then destroy the task state for the actual __schedule() call that comes after it. Signed-off-by: Peter Zijlstra (Intel) Cc: t...@linutronix.de Cc: ilya.dryo...@inktank.com Cc: umgwanakikb...@gmail.com Cc: o...@redhat.com Cc: Linus Torvalds Link: http://lkml.kernel.org/r/20140924082242.591637...@infradead.org Signed-off-by: Ingo Molnar Which does produce the following trace (hand-copied most important parts of it): Warning: CPU 0 PID: 68 at kernel/sched/core.c:7311 __might_sleep+0x143/0x170 do not call blocking ops when !TASK_RUNNING; state=1 set at [] pccardd+0xa0/0x3e0 ... Call trace: ... __might_sleep+0x143/0x170 ? pccardd+0xa0/0x3e0 ? pccardd+0xa0/0x3e0 mutex_lock+0x17/0x2a pccardd+0xe9/0x3e0 ? pcmcia_socket_uevent+0x30/0x30 pccardd() is located in drivers/pcmcia/cs.c and seems to be of the structure Peter's patch wants to warn about. Bruno -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Tue, 20 January 2015 Linus Torvalds wrote: > On Tue, Jan 20, 2015 at 6:02 AM, Bruno Prémont wrote: > > > > No idea yet which rc is the offender (nor exact patch), but on my not > > so recent UP laptop with a pccard slot I have 2 pccardd kernel threads > > converting my laptop into a heater. > > > > lspci for affected nodes: > > 02:06.0 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus > > Controller [1217:7113] (rev 20) > > 02:06.1 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus > > Controller [1217:7113] (rev 20) > > > > Very basics I have, before I attempt any bisection: > > Hmm. I'm not seeing anything recent changing anything in this area, so > I suspect that unless somebody else steps up and says "Ahh, that > sounds like xyz", your bisection is the best option. I've done most of the bisection and ended up in the scheduler changes. CCing Peter. A few iterations from the end kernel is warning about (might?)sleeping during locking or so (scrolling too fast to read end making access to console hard). My current bisection log is: git bisect start # bad: [117af36e3bceb4fceb1c63489d6f3d94ed259a4c] Apple-GMUX: inhibit VGA arbitration changes git bisect bad 117af36e3bceb4fceb1c63489d6f3d94ed259a4c # good: [b2776bf7149bddd1f4161f14f79520f17fc1d71d] Linux 3.18 git bisect good b2776bf7149bddd1f4161f14f79520f17fc1d71d # bad: [c0222ac086669a631814bbf857f8c8023452a4d7] Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus git bisect bad c0222ac086669a631814bbf857f8c8023452a4d7 # bad: [2183a58803c2bbd87c2d0057eed6779ec4718d4d] Merge tag 'media/v3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media git bisect bad 2183a58803c2bbd87c2d0057eed6779ec4718d4d # good: [6da314122ddc11936c6f054753bbb956a499d020] Merge tag 'dt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc git bisect good 6da314122ddc11936c6f054753bbb956a499d020 # bad: [cbfe0de303a55ed96d8831c2d5f56f8131cd6612] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs git bisect bad cbfe0de303a55ed96d8831c2d5f56f8131cd6612 # bad: [9e66645d72d3c395da92b0f8855c787f4b5f0e89] Merge branch 'irq-irqdomain-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect bad 9e66645d72d3c395da92b0f8855c787f4b5f0e89 # good: [5706ffd045c3810912c4982357d7daa721af3464] Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect good 5706ffd045c3810912c4982357d7daa721af3464 # bad: [a157508c9790ccd1c8b5c6a828d6ba85bbe95aaa] Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect bad a157508c9790ccd1c8b5c6a828d6ba85bbe95aaa # bad: [acb32132ec0433c03bed750f3e9508dc29db0328] sched/deadline: Add deadline rq status print git bisect bad acb32132ec0433c03bed750f3e9508dc29db0328 # good: [1029a2b52c09e479fd7b07275812ad97868c0fb0] sched, exit: Deal with nested sleeps git bisect good 1029a2b52c09e479fd7b07275812ad97868c0fb0 That means, the remaining commits would be: commit acb32132ec0433c03bed750f3e9508dc29db0328 Author: Wanpeng Li Date: Fri Oct 31 06:39:33 2014 +0800 sched/deadline: Add deadline rq status print ... commit e23738a7300a7591a57a22f47b813fd1b53ec404 Author: Peter Zijlstra Date: Wed Sep 24 10:18:50 2014 +0200 sched, inotify: Deal with nested sleeps It looks like pccardd might be doing the wrong thing for locking/sleeping/waiting. Trying to complete bisection. Bruno -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Wed, 2015-01-21 at 22:37 +0100, Bruno Prémont wrote: On Wed, 21 January 2015 Bruno Prémont wrote: On Tue, 20 January 2015 Linus Torvalds wrote: On Tue, Jan 20, 2015 at 6:02 AM, Bruno Prémont wrote: No idea yet which rc is the offender (nor exact patch), but on my not so recent UP laptop with a pccard slot I have 2 pccardd kernel threads converting my laptop into a heater. lspci for affected nodes: 02:06.0 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus Controller [1217:7113] (rev 20) 02:06.1 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus Controller [1217:7113] (rev 20) Very basics I have, before I attempt any bisection: Hmm. I'm not seeing anything recent changing anything in this area, so I suspect that unless somebody else steps up and says Ahh, that sounds like xyz, your bisection is the best option. Bisecting to the end did point me at (the warning traces produced in great quantities might not be the very same issue as the abusive CPU usage, but certainly look very related): [CCing people on CC for the patch] commit 8eb23b9f35aae413140d3fda766a98092c21e9b0 Author: Peter Zijlstra pet...@infradead.org Date: Wed Sep 24 10:18:55 2014 +0200 sched: Debug nested sleeps Validate we call might_sleep() with TASK_RUNNING, which catches places where we nest blocking primitives, eg. mutex usage in a wait loop. Since all blocking is arranged through task_struct::state, nesting this will cause the inner primitive to set TASK_RUNNING and the outer will thus not block. Another observed problem is calling a blocking function from schedule()-sched_submit_work()-blk_schedule_flush_plug() which will then destroy the task state for the actual __schedule() call that comes after it. Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org Cc: t...@linutronix.de Cc: ilya.dryo...@inktank.com Cc: umgwanakikb...@gmail.com Cc: o...@redhat.com Cc: Linus Torvalds torva...@linux-foundation.org Link: http://lkml.kernel.org/r/20140924082242.591637...@infradead.org Signed-off-by: Ingo Molnar mi...@kernel.org Which does produce the following trace (hand-copied most important parts of it): Warning: CPU 0 PID: 68 at kernel/sched/core.c:7311 __might_sleep+0x143/0x170 do not call blocking ops when !TASK_RUNNING; state=1 set at [c1436390] pccardd+0xa0/0x3e0 ... Call trace: ... __might_sleep+0x143/0x170 ? pccardd+0xa0/0x3e0 ? pccardd+0xa0/0x3e0 mutex_lock+0x17/0x2a pccardd+0xe9/0x3e0 ? pcmcia_socket_uevent+0x30/0x30 pccardd() is located in drivers/pcmcia/cs.c and seems to be of the structure Peter's patch wants to warn about. Yeah setting current to interruptable so early in the game is bogus. It should be set after unlocking the skt_mutex. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Wed, 21 January 2015 Bruno Prémont wrote: On Tue, 20 January 2015 Linus Torvalds wrote: On Tue, Jan 20, 2015 at 6:02 AM, Bruno Prémont wrote: No idea yet which rc is the offender (nor exact patch), but on my not so recent UP laptop with a pccard slot I have 2 pccardd kernel threads converting my laptop into a heater. lspci for affected nodes: 02:06.0 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus Controller [1217:7113] (rev 20) 02:06.1 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus Controller [1217:7113] (rev 20) Very basics I have, before I attempt any bisection: Hmm. I'm not seeing anything recent changing anything in this area, so I suspect that unless somebody else steps up and says Ahh, that sounds like xyz, your bisection is the best option. Bisecting to the end did point me at (the warning traces produced in great quantities might not be the very same issue as the abusive CPU usage, but certainly look very related): [CCing people on CC for the patch] commit 8eb23b9f35aae413140d3fda766a98092c21e9b0 Author: Peter Zijlstra pet...@infradead.org Date: Wed Sep 24 10:18:55 2014 +0200 sched: Debug nested sleeps Validate we call might_sleep() with TASK_RUNNING, which catches places where we nest blocking primitives, eg. mutex usage in a wait loop. Since all blocking is arranged through task_struct::state, nesting this will cause the inner primitive to set TASK_RUNNING and the outer will thus not block. Another observed problem is calling a blocking function from schedule()-sched_submit_work()-blk_schedule_flush_plug() which will then destroy the task state for the actual __schedule() call that comes after it. Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org Cc: t...@linutronix.de Cc: ilya.dryo...@inktank.com Cc: umgwanakikb...@gmail.com Cc: o...@redhat.com Cc: Linus Torvalds torva...@linux-foundation.org Link: http://lkml.kernel.org/r/20140924082242.591637...@infradead.org Signed-off-by: Ingo Molnar mi...@kernel.org Which does produce the following trace (hand-copied most important parts of it): Warning: CPU 0 PID: 68 at kernel/sched/core.c:7311 __might_sleep+0x143/0x170 do not call blocking ops when !TASK_RUNNING; state=1 set at [c1436390] pccardd+0xa0/0x3e0 ... Call trace: ... __might_sleep+0x143/0x170 ? pccardd+0xa0/0x3e0 ? pccardd+0xa0/0x3e0 mutex_lock+0x17/0x2a pccardd+0xe9/0x3e0 ? pcmcia_socket_uevent+0x30/0x30 pccardd() is located in drivers/pcmcia/cs.c and seems to be of the structure Peter's patch wants to warn about. Bruno -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On 01/21/2015 05:12 PM, Davidlohr Bueso wrote: On Wed, 2015-01-21 at 22:37 +0100, Bruno Prémont wrote: On Wed, 21 January 2015 Bruno Prémont wrote: On Tue, 20 January 2015 Linus Torvalds wrote: On Tue, Jan 20, 2015 at 6:02 AM, Bruno Prémont wrote: No idea yet which rc is the offender (nor exact patch), but on my not so recent UP laptop with a pccard slot I have 2 pccardd kernel threads converting my laptop into a heater. [...] Bisecting to the end did point me at (the warning traces produced in great quantities might not be the very same issue as the abusive CPU usage, but certainly look very related): [CCing people on CC for the patch] commit 8eb23b9f35aae413140d3fda766a98092c21e9b0 Author: Peter Zijlstra pet...@infradead.org Date: Wed Sep 24 10:18:55 2014 +0200 [...] Which does produce the following trace (hand-copied most important parts of it): Warning: CPU 0 PID: 68 at kernel/sched/core.c:7311 __might_sleep+0x143/0x170 do not call blocking ops when !TASK_RUNNING; state=1 set at [c1436390] pccardd+0xa0/0x3e0 ... Call trace: ... __might_sleep+0x143/0x170 ? pccardd+0xa0/0x3e0 ? pccardd+0xa0/0x3e0 mutex_lock+0x17/0x2a pccardd+0xe9/0x3e0 ? pcmcia_socket_uevent+0x30/0x30 pccardd() is located in drivers/pcmcia/cs.c and seems to be of the structure Peter's patch wants to warn about. Yeah setting current to interruptable so early in the game is bogus. It should be set after unlocking the skt_mutex. Yeah, but the debug check is triggering worse behavior, requiring bisecting back to the debug commit. How does the might_sleep() check here guarantee the task won't sleep? Regards, Peter Hurley -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Tue, 20 January 2015 Linus Torvalds wrote: On Tue, Jan 20, 2015 at 6:02 AM, Bruno Prémont wrote: No idea yet which rc is the offender (nor exact patch), but on my not so recent UP laptop with a pccard slot I have 2 pccardd kernel threads converting my laptop into a heater. lspci for affected nodes: 02:06.0 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus Controller [1217:7113] (rev 20) 02:06.1 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus Controller [1217:7113] (rev 20) Very basics I have, before I attempt any bisection: Hmm. I'm not seeing anything recent changing anything in this area, so I suspect that unless somebody else steps up and says Ahh, that sounds like xyz, your bisection is the best option. I've done most of the bisection and ended up in the scheduler changes. CCing Peter. A few iterations from the end kernel is warning about (might?)sleeping during locking or so (scrolling too fast to read end making access to console hard). My current bisection log is: git bisect start # bad: [117af36e3bceb4fceb1c63489d6f3d94ed259a4c] Apple-GMUX: inhibit VGA arbitration changes git bisect bad 117af36e3bceb4fceb1c63489d6f3d94ed259a4c # good: [b2776bf7149bddd1f4161f14f79520f17fc1d71d] Linux 3.18 git bisect good b2776bf7149bddd1f4161f14f79520f17fc1d71d # bad: [c0222ac086669a631814bbf857f8c8023452a4d7] Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus git bisect bad c0222ac086669a631814bbf857f8c8023452a4d7 # bad: [2183a58803c2bbd87c2d0057eed6779ec4718d4d] Merge tag 'media/v3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media git bisect bad 2183a58803c2bbd87c2d0057eed6779ec4718d4d # good: [6da314122ddc11936c6f054753bbb956a499d020] Merge tag 'dt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc git bisect good 6da314122ddc11936c6f054753bbb956a499d020 # bad: [cbfe0de303a55ed96d8831c2d5f56f8131cd6612] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs git bisect bad cbfe0de303a55ed96d8831c2d5f56f8131cd6612 # bad: [9e66645d72d3c395da92b0f8855c787f4b5f0e89] Merge branch 'irq-irqdomain-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect bad 9e66645d72d3c395da92b0f8855c787f4b5f0e89 # good: [5706ffd045c3810912c4982357d7daa721af3464] Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect good 5706ffd045c3810912c4982357d7daa721af3464 # bad: [a157508c9790ccd1c8b5c6a828d6ba85bbe95aaa] Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect bad a157508c9790ccd1c8b5c6a828d6ba85bbe95aaa # bad: [acb32132ec0433c03bed750f3e9508dc29db0328] sched/deadline: Add deadline rq status print git bisect bad acb32132ec0433c03bed750f3e9508dc29db0328 # good: [1029a2b52c09e479fd7b07275812ad97868c0fb0] sched, exit: Deal with nested sleeps git bisect good 1029a2b52c09e479fd7b07275812ad97868c0fb0 That means, the remaining commits would be: commit acb32132ec0433c03bed750f3e9508dc29db0328 Author: Wanpeng Li wanpeng...@linux.intel.com Date: Fri Oct 31 06:39:33 2014 +0800 sched/deadline: Add deadline rq status print ... commit e23738a7300a7591a57a22f47b813fd1b53ec404 Author: Peter Zijlstra pet...@infradead.org Date: Wed Sep 24 10:18:50 2014 +0200 sched, inotify: Deal with nested sleeps It looks like pccardd might be doing the wrong thing for locking/sleeping/waiting. Trying to complete bisection. Bruno -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Tue, Jan 20, 2015 at 6:02 AM, Bruno Prémont wrote: > > No idea yet which rc is the offender (nor exact patch), but on my not > so recent UP laptop with a pccard slot I have 2 pccardd kernel threads > converting my laptop into a heater. > > lspci for affected nodes: > 02:06.0 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus > Controller [1217:7113] (rev 20) > 02:06.1 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus > Controller [1217:7113] (rev 20) > > Very basics I have, before I attempt any bisection: Hmm. I'm not seeing anything recent changing anything in this area, so I suspect that unless somebody else steps up and says "Ahh, that sounds like xyz", your bisection is the best option. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Sun, 18 January 2015 Linus Torvalds wrote: > Another week, another -rc. > > Fairly normal release, although I'd wish that by rc5 we'd have calmed > down even further. But no, with some of the driver tree merges in > particular, this is actually larger than rc4 was. > > That said, it's not like there is anything particularly scary in here. > > The arm64 vm bug that I mentioned as pending in the rc4 notes got > fixed within a day of that previous rc release, and the rest looks > pretty standard. Mostly drivers (networking, usb, scsi target, block > layer, mmc, tty etc), but also arch updates (arm, x86, s390 and some > tiny powerpc fixes), some filesystem updates (fuse and nfs), tracing > fixes, and some perf tooling fixes. > > Shortlog with the details appended. > > Go forth and test. No idea yet which rc is the offender (nor exact patch), but on my not so recent UP laptop with a pccard slot I have 2 pccardd kernel threads converting my laptop into a heater. lspci for affected nodes: 02:06.0 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus Controller [1217:7113] (rev 20) 02:06.1 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus Controller [1217:7113] (rev 20) Very basics I have, before I attempt any bisection: cat /proc/$(pidof pccardd)/stack [] pccardd+0x1a5/0x3e0 [] kthread+0xa0/0xc0 [] ret_from_kernel_thread+0x20/0x30 [] 0x Doing objdump on drivers/pcmcia/cs.o in which pccardd() is defined it seems pccardd is stuck in try_to_freeze_unsafe(). Does this kind of behavior ring a bell? I can unbind the two yenta_cardbus devices from yenta_cardbus to stop the CPU usage. Possibly important config option: # CONFIG_SMP is not set Objdump below: static int pccardd(void *__skt) { ca0: 55 push %ebp ca1: 89 e5 mov%esp,%ebp ca3: 57 push %edi ca4: 56 push %esi ca5: 53 push %ebx ca6: 89 c3 mov%eax,%ebx ca8: 83 ec 24sub$0x24,%esp DECLARE_PER_CPU(struct task_struct *, current_task); static __always_inline struct task_struct *get_current(void) { return this_cpu_read_stable(current_task); cab: a1 00 00 00 00 mov0x0,%eax struct pcmcia_socket *skt = __skt; int ret; skt->thread = current; cb0: 89 83 e4 00 00 00 mov%eax,0xe4(%ebx) skt->socket = dead_socket; cb6: a1 00 00 00 00 mov0x0,%eax skt->ops->init(skt); cbb: 8b 93 cc 00 00 00 mov0xcc(%ebx),%edx { struct pcmcia_socket *skt = __skt; int ret; skt->thread = current; skt->socket = dead_socket; cc1: 89 43 04mov%eax,0x4(%ebx) cc4: a1 04 00 00 00 mov0x4,%eax cc9: 89 43 08mov%eax,0x8(%ebx) ccc: a1 08 00 00 00 mov0x8,%eax cd1: 89 43 0cmov%eax,0xc(%ebx) skt->ops->init(skt); cd4: 89 d8 mov%ebx,%eax cd6: ff 12 call *(%edx) skt->ops->set_socket(skt, >socket); cd8: 8b 8b cc 00 00 00 mov0xcc(%ebx),%ecx cde: 8d 53 04lea0x4(%ebx),%edx ce1: 89 d8 mov%ebx,%eax ce3: ff 51 0ccall *0xc(%ecx) /* register with the device core */ ret = device_register(>dev); ce6: 8d 83 2c 01 00 00 lea0x12c(%ebx),%eax cec: 89 45 e0mov%eax,-0x20(%ebp) cef: e8 fc ff ff ff call cf0 if (ret) { cf4: 85 c0 test %eax,%eax cf6: 0f 85 d4 02 00 00 jnefd0 "PCMCIA: unable to register socket\n"); skt->thread = NULL; complete(>thread_done); return 0; } ret = pccard_sysfs_add_socket(>dev); cfc: 8b 45 e0mov-0x20(%ebp),%eax cff: e8 fc ff ff ff call d00 if (ret) d04: 85 c0 test %eax,%eax "PCMCIA: unable to register socket\n"); skt->thread = NULL; complete(>thread_done); return 0; } ret = pccard_sysfs_add_socket(>dev); d06: 89 45 e4mov%eax,-0x1c(%ebp) if (ret) d09: 0f 85 01 03 00 00 jne1010 dev_warn(>dev, "err %d adding socket attributes\n", ret); complete(>thread_done); d0f: 8d 83 e8 00 00 00 lea0xe8(%ebx),%eax d15: e8 fc ff ff ff call d16 /* wait for userspace to catch up */
Re: Linux 3.19-rc5
On Tue, Jan 20, 2015 at 6:02 AM, Bruno Prémont bonb...@linux-vserver.org wrote: No idea yet which rc is the offender (nor exact patch), but on my not so recent UP laptop with a pccard slot I have 2 pccardd kernel threads converting my laptop into a heater. lspci for affected nodes: 02:06.0 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus Controller [1217:7113] (rev 20) 02:06.1 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus Controller [1217:7113] (rev 20) Very basics I have, before I attempt any bisection: Hmm. I'm not seeing anything recent changing anything in this area, so I suspect that unless somebody else steps up and says Ahh, that sounds like xyz, your bisection is the best option. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.19-rc5
On Sun, 18 January 2015 Linus Torvalds torva...@linux-foundation.org wrote: Another week, another -rc. Fairly normal release, although I'd wish that by rc5 we'd have calmed down even further. But no, with some of the driver tree merges in particular, this is actually larger than rc4 was. That said, it's not like there is anything particularly scary in here. The arm64 vm bug that I mentioned as pending in the rc4 notes got fixed within a day of that previous rc release, and the rest looks pretty standard. Mostly drivers (networking, usb, scsi target, block layer, mmc, tty etc), but also arch updates (arm, x86, s390 and some tiny powerpc fixes), some filesystem updates (fuse and nfs), tracing fixes, and some perf tooling fixes. Shortlog with the details appended. Go forth and test. No idea yet which rc is the offender (nor exact patch), but on my not so recent UP laptop with a pccard slot I have 2 pccardd kernel threads converting my laptop into a heater. lspci for affected nodes: 02:06.0 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus Controller [1217:7113] (rev 20) 02:06.1 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus Controller [1217:7113] (rev 20) Very basics I have, before I attempt any bisection: cat /proc/$(pidof pccardd)/stack [c143ec95] pccardd+0x1a5/0x3e0 [c10543a0] kthread+0xa0/0xc0 [c16cd840] ret_from_kernel_thread+0x20/0x30 [] 0x Doing objdump on drivers/pcmcia/cs.o in which pccardd() is defined it seems pccardd is stuck in try_to_freeze_unsafe(). Does this kind of behavior ring a bell? I can unbind the two yenta_cardbus devices from yenta_cardbus to stop the CPU usage. Possibly important config option: # CONFIG_SMP is not set Objdump below: static int pccardd(void *__skt) { ca0: 55 push %ebp ca1: 89 e5 mov%esp,%ebp ca3: 57 push %edi ca4: 56 push %esi ca5: 53 push %ebx ca6: 89 c3 mov%eax,%ebx ca8: 83 ec 24sub$0x24,%esp DECLARE_PER_CPU(struct task_struct *, current_task); static __always_inline struct task_struct *get_current(void) { return this_cpu_read_stable(current_task); cab: a1 00 00 00 00 mov0x0,%eax struct pcmcia_socket *skt = __skt; int ret; skt-thread = current; cb0: 89 83 e4 00 00 00 mov%eax,0xe4(%ebx) skt-socket = dead_socket; cb6: a1 00 00 00 00 mov0x0,%eax skt-ops-init(skt); cbb: 8b 93 cc 00 00 00 mov0xcc(%ebx),%edx { struct pcmcia_socket *skt = __skt; int ret; skt-thread = current; skt-socket = dead_socket; cc1: 89 43 04mov%eax,0x4(%ebx) cc4: a1 04 00 00 00 mov0x4,%eax cc9: 89 43 08mov%eax,0x8(%ebx) ccc: a1 08 00 00 00 mov0x8,%eax cd1: 89 43 0cmov%eax,0xc(%ebx) skt-ops-init(skt); cd4: 89 d8 mov%ebx,%eax cd6: ff 12 call *(%edx) skt-ops-set_socket(skt, skt-socket); cd8: 8b 8b cc 00 00 00 mov0xcc(%ebx),%ecx cde: 8d 53 04lea0x4(%ebx),%edx ce1: 89 d8 mov%ebx,%eax ce3: ff 51 0ccall *0xc(%ecx) /* register with the device core */ ret = device_register(skt-dev); ce6: 8d 83 2c 01 00 00 lea0x12c(%ebx),%eax cec: 89 45 e0mov%eax,-0x20(%ebp) cef: e8 fc ff ff ff call cf0 pccardd+0x50 if (ret) { cf4: 85 c0 test %eax,%eax cf6: 0f 85 d4 02 00 00 jnefd0 pccardd+0x330 PCMCIA: unable to register socket\n); skt-thread = NULL; complete(skt-thread_done); return 0; } ret = pccard_sysfs_add_socket(skt-dev); cfc: 8b 45 e0mov-0x20(%ebp),%eax cff: e8 fc ff ff ff call d00 pccardd+0x60 if (ret) d04: 85 c0 test %eax,%eax PCMCIA: unable to register socket\n); skt-thread = NULL; complete(skt-thread_done); return 0; } ret = pccard_sysfs_add_socket(skt-dev); d06: 89 45 e4mov%eax,-0x1c(%ebp) if (ret) d09: 0f 85 01 03 00 00 jne1010 pccardd+0x370 dev_warn(skt-dev, err %d adding socket attributes\n, ret); complete(skt-thread_done); d0f: 8d 83 e8 00 00 00 lea0xe8(%ebx),%eax d15: