subject:"Re\: Linux 3.19\-rc5"

Re: Linux 3.19-rc5

2015-02-06 Thread Linus Torvalds

On Fri, Feb 6, 2015 at 8:39 AM, Peter Zijlstra  wrote:
>
> I like the macro one though; I worry a wee bit about non-documented
> cases through. If someone is doing something way subtle we'll break it
> :/

Yeah, I agree. And I wonder how much we should even care. People who
do this thing by hand - rather than using the wait-event model - are
*likely* legacy stuff or just random drivers, or simply stuff that
isn't all that performance-sensitive.

So I dunno how worthwhile this all is.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-02-06 Thread Peter Zijlstra

On Fri, Feb 06, 2015 at 08:02:41AM -0800, Linus Torvalds wrote:
> On Fri, Feb 6, 2015 at 3:50 AM, Peter Zijlstra  wrote:
> >
> > Also, set_current_state(TASK_RUNNING) is almost always pointless, nobody
> > cares about that barrier, so make it go away.
> 
> I'd rather not mix this with the patch, and wonder if we should just
> do that globally with some preprocessor magic. We do have a fair
> number of "set_current_state(TASK_RUNNING)" 

138

> and at least for the
> *documented* reason for the memory barrier, all of them could/should
> be barrier-less.
> 
> So something like
> 
> if (__is_constant_p(state) && state == TASK_RUNNING)
> tsk->state = state;
> else
> set_mb(tsk->state, state);
> 
> might be more general solution than randomly doing one at a time when
> changing code around it..

Yeah, or we could do the coccinelle thing and do a mass conversion.

I like the macro one though; I worry a wee bit about non-documented
cases through. If someone is doing something way subtle we'll break it
:/


---
Subject: sched: Avoid the full memory-barrier for 
set_current_state(TASK_RUNNING)

One should never need the full memory barrier implied by
set_current_state() to set TASK_RUNNING for the documented reason of
avoiding races against wakeup.

Suggested-by: Linus Torvalds 
Signed-off-by: Peter Zijlstra (Intel) 
---
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8db31ef98d2f..aea44c4eeed8 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -243,6 +243,27 @@ extern char ___assert_task_state[1 - 2*!!(
((task->state & TASK_UNINTERRUPTIBLE) != 0 && \
 (task->flags & PF_FROZEN) == 0)
 
+/*
+ * set_current_state() includes a barrier so that the write of current->state
+ * is correctly serialised wrt the caller's subsequent test of whether to
+ * actually sleep:
+ *
+ * set_current_state(TASK_UNINTERRUPTIBLE);
+ * if (do_i_need_to_sleep())
+ * schedule();
+ *
+ * If the caller does not need such serialisation then use
+ * __set_current_state(). This is always true for TASK_RUNNING since
+ * there is no race against wakeup -- both write the same value.
+ */
+#define ___set_current_state(state)\
+do {   \
+   if (__is_constant_p(state) && (state) == TASK_RUNNING)  \
+   current->state = (state);   \
+   else\
+   set_mb(current->state, (state));\
+} while (0)
+
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
 
 #define __set_task_state(tsk, state_value) \
@@ -256,17 +277,6 @@ extern char ___assert_task_state[1 - 2*!!(
set_mb((tsk)->state, (state_value));\
} while (0)
 
-/*
- * set_current_state() includes a barrier so that the write of current->state
- * is correctly serialised wrt the caller's subsequent test of whether to
- * actually sleep:
- *
- * set_current_state(TASK_UNINTERRUPTIBLE);
- * if (do_i_need_to_sleep())
- * schedule();
- *
- * If the caller does not need such serialisation then use 
__set_current_state()
- */
 #define __set_current_state(state_value)   \
do {\
current->task_state_change = _THIS_IP_; \
@@ -275,7 +285,7 @@ extern char ___assert_task_state[1 - 2*!!(
 #define set_current_state(state_value) \
do {\
current->task_state_change = _THIS_IP_; \
-   set_mb(current->state, (state_value));  \
+   ___set_current_state(state_value);  \
} while (0)
 
 #else
@@ -285,21 +295,10 @@ extern char ___assert_task_state[1 - 2*!!(
 #define set_task_state(tsk, state_value)   \
set_mb((tsk)->state, (state_value))
 
-/*
- * set_current_state() includes a barrier so that the write of current->state
- * is correctly serialised wrt the caller's subsequent test of whether to
- * actually sleep:
- *
- * set_current_state(TASK_UNINTERRUPTIBLE);
- * if (do_i_need_to_sleep())
- * schedule();
- *
- * If the caller does not need such serialisation then use 
__set_current_state()
- */
 #define __set_current_state(state_value)   \
do { current->state = (state_value); } while (0)
 #define set_current_state(state_value) \
-   set_mb(current->state, (state_value))
+   ___set_current_state(state_value);
 
 #endif
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-02-06 Thread Linus Torvalds

On Fri, Feb 6, 2015 at 3:50 AM, Peter Zijlstra  wrote:
>
> Also, set_current_state(TASK_RUNNING) is almost always pointless, nobody
> cares about that barrier, so make it go away.

I'd rather not mix this with the patch, and wonder if we should just
do that globally with some preprocessor magic. We do have a fair
number of "set_current_state(TASK_RUNNING)" and at least for the
*documented* reason for the memory barrier, all of them could/should
be barrier-less.

So something like

if (__is_constant_p(state) && state == TASK_RUNNING)
tsk->state = state;
else
set_mb(tsk->state, state);

might be more general solution than randomly doing one at a time when
changing code around it..

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-02-06 Thread Peter Zijlstra

On Thu, Feb 05, 2015 at 10:14:36PM +0100, Bruno Prémont wrote:
> The loop is now replaced by a single WARN() trace - I guess expected:

> From my reading of the thread fixing pccardd/sched TASK_RUNNING usage/check
> is another issue left for the future.

Yeah, something like the below will make it go away -- under the
assumption that that comment is actually correct, I don't know, the
pcmcia people should probably write a better comment :/

Also, set_current_state(TASK_RUNNING) is almost always pointless, nobody
cares about that barrier, so make it go away.

---
 drivers/pcmcia/cs.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/pcmcia/cs.c b/drivers/pcmcia/cs.c
index 5292db69c426..5678e161a17d 100644
--- a/drivers/pcmcia/cs.c
+++ b/drivers/pcmcia/cs.c
@@ -635,6 +635,12 @@ static int pccardd(void *__skt)
skt->sysfs_events = 0;
spin_unlock_irqrestore(>thread_lock, flags);
 
+   /*
+* Supposedly this is a rarely contended mutex and
+* sleeping is therefore unlikely, the occasional
+* extra loop iteration is harmless.
+*/
+   sched_annotate_sleep();
mutex_lock(>skt_mutex);
if (events & SS_DETECT)
socket_detect_change(skt);
@@ -679,7 +685,7 @@ static int pccardd(void *__skt)
try_to_freeze();
}
/* make sure we are running before we exit */
-   set_current_state(TASK_RUNNING);
+   __set_current_state(TASK_RUNNING);
 
/* shut down socket, if a device is still present */
if (skt->state & SOCKET_PRESENT) {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-02-06 Thread Peter Zijlstra

On Thu, Feb 05, 2015 at 10:14:36PM +0100, Bruno Prémont wrote:
 The loop is now replaced by a single WARN() trace - I guess expected:

 From my reading of the thread fixing pccardd/sched TASK_RUNNING usage/check
 is another issue left for the future.

Yeah, something like the below will make it go away -- under the
assumption that that comment is actually correct, I don't know, the
pcmcia people should probably write a better comment :/

Also, set_current_state(TASK_RUNNING) is almost always pointless, nobody
cares about that barrier, so make it go away.

---
 drivers/pcmcia/cs.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/pcmcia/cs.c b/drivers/pcmcia/cs.c
index 5292db69c426..5678e161a17d 100644
--- a/drivers/pcmcia/cs.c
+++ b/drivers/pcmcia/cs.c
@@ -635,6 +635,12 @@ static int pccardd(void *__skt)
skt-sysfs_events = 0;
spin_unlock_irqrestore(skt-thread_lock, flags);
 
+   /*
+* Supposedly this is a rarely contended mutex and
+* sleeping is therefore unlikely, the occasional
+* extra loop iteration is harmless.
+*/
+   sched_annotate_sleep();
mutex_lock(skt-skt_mutex);
if (events  SS_DETECT)
socket_detect_change(skt);
@@ -679,7 +685,7 @@ static int pccardd(void *__skt)
try_to_freeze();
}
/* make sure we are running before we exit */
-   set_current_state(TASK_RUNNING);
+   __set_current_state(TASK_RUNNING);
 
/* shut down socket, if a device is still present */
if (skt-state  SOCKET_PRESENT) {
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-02-06 Thread Linus Torvalds

On Fri, Feb 6, 2015 at 3:50 AM, Peter Zijlstra pet...@infradead.org wrote:

 Also, set_current_state(TASK_RUNNING) is almost always pointless, nobody
 cares about that barrier, so make it go away.

I'd rather not mix this with the patch, and wonder if we should just
do that globally with some preprocessor magic. We do have a fair
number of set_current_state(TASK_RUNNING) and at least for the
*documented* reason for the memory barrier, all of them could/should
be barrier-less.

So something like

if (__is_constant_p(state)  state == TASK_RUNNING)
tsk-state = state;
else
set_mb(tsk-state, state);

might be more general solution than randomly doing one at a time when
changing code around it..

 Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-02-06 Thread Peter Zijlstra

On Fri, Feb 06, 2015 at 08:02:41AM -0800, Linus Torvalds wrote:
 On Fri, Feb 6, 2015 at 3:50 AM, Peter Zijlstra pet...@infradead.org wrote:
 
  Also, set_current_state(TASK_RUNNING) is almost always pointless, nobody
  cares about that barrier, so make it go away.
 
 I'd rather not mix this with the patch, and wonder if we should just
 do that globally with some preprocessor magic. We do have a fair
 number of set_current_state(TASK_RUNNING) 

138

 and at least for the
 *documented* reason for the memory barrier, all of them could/should
 be barrier-less.
 
 So something like
 
 if (__is_constant_p(state)  state == TASK_RUNNING)
 tsk-state = state;
 else
 set_mb(tsk-state, state);
 
 might be more general solution than randomly doing one at a time when
 changing code around it..

Yeah, or we could do the coccinelle thing and do a mass conversion.

I like the macro one though; I worry a wee bit about non-documented
cases through. If someone is doing something way subtle we'll break it
:/


---
Subject: sched: Avoid the full memory-barrier for 
set_current_state(TASK_RUNNING)

One should never need the full memory barrier implied by
set_current_state() to set TASK_RUNNING for the documented reason of
avoiding races against wakeup.

Suggested-by: Linus Torvalds torva...@linux-foundation.org
Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org
---
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8db31ef98d2f..aea44c4eeed8 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -243,6 +243,27 @@ extern char ___assert_task_state[1 - 2*!!(
((task-state  TASK_UNINTERRUPTIBLE) != 0  \
 (task-flags  PF_FROZEN) == 0)
 
+/*
+ * set_current_state() includes a barrier so that the write of current-state
+ * is correctly serialised wrt the caller's subsequent test of whether to
+ * actually sleep:
+ *
+ * set_current_state(TASK_UNINTERRUPTIBLE);
+ * if (do_i_need_to_sleep())
+ * schedule();
+ *
+ * If the caller does not need such serialisation then use
+ * __set_current_state(). This is always true for TASK_RUNNING since
+ * there is no race against wakeup -- both write the same value.
+ */
+#define ___set_current_state(state)\
+do {   \
+   if (__is_constant_p(state)  (state) == TASK_RUNNING)  \
+   current-state = (state);   \
+   else\
+   set_mb(current-state, (state));\
+} while (0)
+
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
 
 #define __set_task_state(tsk, state_value) \
@@ -256,17 +277,6 @@ extern char ___assert_task_state[1 - 2*!!(
set_mb((tsk)-state, (state_value));\
} while (0)
 
-/*
- * set_current_state() includes a barrier so that the write of current-state
- * is correctly serialised wrt the caller's subsequent test of whether to
- * actually sleep:
- *
- * set_current_state(TASK_UNINTERRUPTIBLE);
- * if (do_i_need_to_sleep())
- * schedule();
- *
- * If the caller does not need such serialisation then use 
__set_current_state()
- */
 #define __set_current_state(state_value)   \
do {\
current-task_state_change = _THIS_IP_; \
@@ -275,7 +285,7 @@ extern char ___assert_task_state[1 - 2*!!(
 #define set_current_state(state_value) \
do {\
current-task_state_change = _THIS_IP_; \
-   set_mb(current-state, (state_value));  \
+   ___set_current_state(state_value);  \
} while (0)
 
 #else
@@ -285,21 +295,10 @@ extern char ___assert_task_state[1 - 2*!!(
 #define set_task_state(tsk, state_value)   \
set_mb((tsk)-state, (state_value))
 
-/*
- * set_current_state() includes a barrier so that the write of current-state
- * is correctly serialised wrt the caller's subsequent test of whether to
- * actually sleep:
- *
- * set_current_state(TASK_UNINTERRUPTIBLE);
- * if (do_i_need_to_sleep())
- * schedule();
- *
- * If the caller does not need such serialisation then use 
__set_current_state()
- */
 #define __set_current_state(state_value)   \
do { current-state = (state_value); } while (0)
 #define set_current_state(state_value) \
-   set_mb(current-state, (state_value))
+   ___set_current_state(state_value);
 
 #endif
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-02-06 Thread Linus Torvalds

On Fri, Feb 6, 2015 at 8:39 AM, Peter Zijlstra pet...@infradead.org wrote:

 I like the macro one though; I worry a wee bit about non-documented
 cases through. If someone is doing something way subtle we'll break it
 :/

Yeah, I agree. And I wonder how much we should even care. People who
do this thing by hand - rather than using the wait-event model - are
*likely* legacy stuff or just random drivers, or simply stuff that
isn't all that performance-sensitive.

So I dunno how worthwhile this all is.

 Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-02-05 Thread Bruno Prémont

On Thu, 29 January 2015 Linus Torvalds wrote:
> On Thu, Jan 29, 2015 at 5:12 PM, Linus Torvalds wrote:
> >
> > The WARN() was already changed to a WARN_ONCE().
> 
> Oh, but I notice that the "__set_current_state(TASK_RUNNING) ends up
> always happening.
> 
> So I think the right fix is to:
> 
>  - warn once like we do
> 
>  - but *not* do that __set_current_state() which was always total crap anyway
> 
> Why do I say "total crap"? Because of two independent issues:
> 
>  (a) it actually changes behavior for a debug vs non-debug kernel,
> which is a really bad idea to begin with
> 
>  (b) it's really wrong. The whole "nested sleep" case was never a
> major bug to begin with, just a possible inefficiency where constant
> nested sleeps would possibly make the outer sleep not sleep. But that
> "could possibly make" case was the unlikely case, and the debug patch
> made it happen *all* the time by explicitly setting things running.
> 
> So I think the proper patch is the attached.
> 
> The comment is also crap. The comment says
> 
> "Blocking primitives will set (and therefore destroy) current->state 
> [...]"
> 
> but the reality is that they *may* set it, and only in the unlikely
> slow-path where they actually block.
> 
> So doing this in "__may_sleep()" is just bogus and horrible horrible
> crap. It turns the "harmless ugliness" into a real *harmful* bug. The
> key word of "__may_sleep()" is that "MAY" part. It's a debug thing to
> make relatively rare cases show up.
> 
> PeterZ, please don't make "debugging" patches like this. Ever again.
> Because this was just stupid, and it took me too long to realize that
> despite the warning being shut up, the debug patch was still actively
> doing bad bad things.
> 
> Ingo, maybe you'd want to apply this through the scheduler tree, the
> way you already did the WARN_ONCE() thing.
> 
> Bruno, does this finally actually fix your pccard thing?

Tested the variant that was applied by running rc7 and it fixes the
endless loop.
The loop is now replaced by a single WARN() trace - I guess expected:

[3.083647] [ cut here ]
[3.087477] WARNING: CPU: 0 PID: 67 at 
/usr/src/linux-git/kernel/sched/core.c:7300 __might_sleep+0x79/0x90()
[3.091357] do not call blocking ops when !TASK_RUNNING; state=1 set at 
[] pccardd+0xa0/0x3e0
[3.095232] Modules linked in:
[3.099020] CPU: 0 PID: 67 Comm: pccardd Not tainted 
3.19.0-rc7-3-g67288c4 #17
[3.102760] Hardware name: Acer TravelMate 660/TravelMate 660, BIOS 3A19 
01/14/2004
[3.106504]  c212def4 c212def4 c212deb4 c16caf23 c212dee4 c10416fd c1907334 
c212df10
[3.110315]  0043 c1907380 1c84 c105a099 c105a099 c1442040 0001 
f5f54bc0
[3.114143]  c212defc c104176e 0009 c212def4 c1907334 c212df10 c212df30 
c105a099
[3.117960] Call Trace:
[3.121703]  [] dump_stack+0x16/0x18
[3.125447]  [] warn_slowpath_common+0x7d/0xc0
[3.129172]  [] ? __might_sleep+0x79/0x90
[3.132868]  [] ? __might_sleep+0x79/0x90
[3.136500]  [] ? pccardd+0xa0/0x3e0
[3.140092]  [] warn_slowpath_fmt+0x2e/0x30
[3.143657]  [] __might_sleep+0x79/0x90
[3.147209]  [] ? pccardd+0xa0/0x3e0
[3.150747]  [] ? pccardd+0xa0/0x3e0
[3.154256]  [] mutex_lock+0x17/0x2a
[3.157734]  [] pccardd+0xe9/0x3e0
[3.161207]  [] ? pcmcia_socket_uevent+0x30/0x30
[3.164660]  [] kthread+0xa0/0xc0
[3.168059]  [] ret_from_kernel_thread+0x20/0x30
[3.171436]  [] ? kthread_worker_fn+0x140/0x140
[3.174796] ---[ end trace c3f708b642e3d8f0 ]---

>From my reading of the thread fixing pccardd/sched TASK_RUNNING usage/check
is another issue left for the future.

Thanks,
Bruno

>   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-02-05 Thread Bruno Prémont

On Thu, 29 January 2015 Linus Torvalds wrote:
 On Thu, Jan 29, 2015 at 5:12 PM, Linus Torvalds wrote:
 
  The WARN() was already changed to a WARN_ONCE().
 
 Oh, but I notice that the __set_current_state(TASK_RUNNING) ends up
 always happening.
 
 So I think the right fix is to:
 
  - warn once like we do
 
  - but *not* do that __set_current_state() which was always total crap anyway
 
 Why do I say total crap? Because of two independent issues:
 
  (a) it actually changes behavior for a debug vs non-debug kernel,
 which is a really bad idea to begin with
 
  (b) it's really wrong. The whole nested sleep case was never a
 major bug to begin with, just a possible inefficiency where constant
 nested sleeps would possibly make the outer sleep not sleep. But that
 could possibly make case was the unlikely case, and the debug patch
 made it happen *all* the time by explicitly setting things running.
 
 So I think the proper patch is the attached.
 
 The comment is also crap. The comment says
 
 Blocking primitives will set (and therefore destroy) current-state 
 [...]
 
 but the reality is that they *may* set it, and only in the unlikely
 slow-path where they actually block.
 
 So doing this in __may_sleep() is just bogus and horrible horrible
 crap. It turns the harmless ugliness into a real *harmful* bug. The
 key word of __may_sleep() is that MAY part. It's a debug thing to
 make relatively rare cases show up.
 
 PeterZ, please don't make debugging patches like this. Ever again.
 Because this was just stupid, and it took me too long to realize that
 despite the warning being shut up, the debug patch was still actively
 doing bad bad things.
 
 Ingo, maybe you'd want to apply this through the scheduler tree, the
 way you already did the WARN_ONCE() thing.
 
 Bruno, does this finally actually fix your pccard thing?

Tested the variant that was applied by running rc7 and it fixes the
endless loop.
The loop is now replaced by a single WARN() trace - I guess expected:

[3.083647] [ cut here ]
[3.087477] WARNING: CPU: 0 PID: 67 at 
/usr/src/linux-git/kernel/sched/core.c:7300 __might_sleep+0x79/0x90()
[3.091357] do not call blocking ops when !TASK_RUNNING; state=1 set at 
[c1442040] pccardd+0xa0/0x3e0
[3.095232] Modules linked in:
[3.099020] CPU: 0 PID: 67 Comm: pccardd Not tainted 
3.19.0-rc7-3-g67288c4 #17
[3.102760] Hardware name: Acer TravelMate 660/TravelMate 660, BIOS 3A19 
01/14/2004
[3.106504]  c212def4 c212def4 c212deb4 c16caf23 c212dee4 c10416fd c1907334 
c212df10
[3.110315]  0043 c1907380 1c84 c105a099 c105a099 c1442040 0001 
f5f54bc0
[3.114143]  c212defc c104176e 0009 c212def4 c1907334 c212df10 c212df30 
c105a099
[3.117960] Call Trace:
[3.121703]  [c16caf23] dump_stack+0x16/0x18
[3.125447]  [c10416fd] warn_slowpath_common+0x7d/0xc0
[3.129172]  [c105a099] ? __might_sleep+0x79/0x90
[3.132868]  [c105a099] ? __might_sleep+0x79/0x90
[3.136500]  [c1442040] ? pccardd+0xa0/0x3e0
[3.140092]  [c104176e] warn_slowpath_fmt+0x2e/0x30
[3.143657]  [c105a099] __might_sleep+0x79/0x90
[3.147209]  [c1442040] ? pccardd+0xa0/0x3e0
[3.150747]  [c1442040] ? pccardd+0xa0/0x3e0
[3.154256]  [c16cf447] mutex_lock+0x17/0x2a
[3.157734]  [c1442089] pccardd+0xe9/0x3e0
[3.161207]  [c1441fa0] ? pcmcia_socket_uevent+0x30/0x30
[3.164660]  [c1056990] kthread+0xa0/0xc0
[3.168059]  [c16d1040] ret_from_kernel_thread+0x20/0x30
[3.171436]  [c10568f0] ? kthread_worker_fn+0x140/0x140
[3.174796] ---[ end trace c3f708b642e3d8f0 ]---

From my reading of the thread fixing pccardd/sched TASK_RUNNING usage/check
is another issue left for the future.

Thanks,
Bruno

   Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-02-03 Thread Ingo Molnar


* Peter Zijlstra  wrote:

> On Thu, Jan 29, 2015 at 05:25:07PM -0800, Linus Torvalds wrote:
>
> > PeterZ, please don't make "debugging" patches like this. Ever 
> > again. Because this was just stupid, and it took me too long 
> > to realize that despite the warning being shut up, the debug 
> > patch was still actively doing bad bad things.
> 
> Understood.

It was my fault too - I should have realized this side effect of 
the new debugging facility when I merged it, which side effect, 
as Linus pointed it out, is a really bad idea.

We are generally pretty good at doing transparent debugging in 
the scheduler and in the locking code, but screwed up this time 
around :-(

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-02-03 Thread Ingo Molnar


* Peter Zijlstra pet...@infradead.org wrote:

 On Thu, Jan 29, 2015 at 05:25:07PM -0800, Linus Torvalds wrote:

  PeterZ, please don't make debugging patches like this. Ever 
  again. Because this was just stupid, and it took me too long 
  to realize that despite the warning being shut up, the debug 
  patch was still actively doing bad bad things.
 
 Understood.

It was my fault too - I should have realized this side effect of 
the new debugging facility when I merged it, which side effect, 
as Linus pointed it out, is a really bad idea.

We are generally pretty good at doing transparent debugging in 
the scheduler and in the locking code, but screwed up this time 
around :-(

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-02-02 Thread Peter Zijlstra

On Sun, Feb 01, 2015 at 12:09:32PM -0800, Linus Torvalds wrote:
> Now, I have the patch that removes that thing (but I was hoping to get
> it from the scheduler tree before doing rc7, which seems to not have
> happened), but yes, that together with your patch seems like it should
> fix all the nasty bug-inducing crud where the "debugging helpers" end
> up silently changing core process state.
> 
> I'll just combine it with yours to avoid extra noise in this area, and
> mark you as the author, fixing *both* of the incorrect state changes.
> Ok?

Ah I see it in your tree; I was about to suggest:

-# define sched_annotate_sleep()__set_current_state(TASK_RUNNING)
+# define sched_annotate_sleep()do { current->task_state_change = 0; } 
while (0)

Instead of the assignment, which has a rvalue.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-02-02 Thread Peter Zijlstra

On Sat, Jan 31, 2015 at 01:54:36PM -0800, Linus Torvalds wrote:
> On Sat, Jan 31, 2015 at 12:16 PM, Peter Zijlstra  wrote:
> >
> > All the stuff it flagged are genuinely wrong, albeit not disastrously
> > so, things mostly just work.
> 
> I really disagree.
> 
> They weren't wrong. They *could* occasionally result in extra
> reschedules, but that was never wrong to begin with.
> 
> But the debugging code made that much much worse, and made that "could
> result in extra reschedules" happen all the time if it triggered.

Yes, that was a bad choice. Again, sorry about that.

> So you say "genuinely wrong", and I say "but that's how things were
> designed - it's an optimistic approach, not an exact one". Your
> debugging code changed that behavior, and actually introduced a real
> bug, exactly because you felt that the "no nested sleeps" was a harder
> requirement than it has ever actually been.
> 
> In other words, I think the debugging code itself is wrong, and then
> that sched_annotate_sleep() thing is just a symptom of how it is
> wrong. If you have to sprinkle these kinds of random workarounds in a
> few core scheduler places (ok, mainly "wait()" paths, it looks like),
> why would you expect random *drivers* to have to care about things
> that even core kernel code says "I'm not going to care about this,
> I'll just shut the warning up, because the warning is wrong".

I was not aware this pattern was so wide spread / accepted. I have
totally underestimated the amount of fallout here.

> So don't get me wrong - I think the whole "add debug code to find
> places where we might have issues" was well worth it, and resulted in
> improvements.
> 
> But once the low-hanging fruit and the core code that everybody hits
> has been fixed, and people cannot apparently even be bothered with the
> other cases it finds (like the pccardd case), at that point the value
> of the debug code becomes rather less obvious.

My bad (again) for not paying proper attention there. No excuses for
that.

> The pccardd example is an example of legacy use of our old and
> original semantics of how the whole nested sleep was supposed to work.
> And it *does* work. It's not a bug. It's how things have worked time
> immemorial, and clearly nobody is really willing to bother with
> changing working - but legacy - cardbus code.  And at that point, I
> think the debug code is actually *wrong*, and causes more problems
> than it "fixes".
> 
> And debug code that causes more problems that it fixes should either
> be removed, or improved to the point where the problems go away.
> 
> The "improved" part might be about saying "it's actually perfectly
> _fine_ to have nested sleeps, as long as it is truly rare that the
> nested sleep actually sleeps". And thus make the debug code really
> test that it's *rare*, rather than test that it never happens. Warn if
> it happens more than a couple of times a second for any particular
> process, or something like that.

So the thing that makes it work is the fact that schedule is called in
a loop, without that loop things come apart very quickly.

Now most core primitives have this loop, and are safe from spurious
wakeups -- and I think we can class this under that. But there's heaps
of legacy code that has non looping calls of schedule and really breaks
with spurious wakeups (I ran into that a few years ago when I made
schedule() return 'randomly' for some other reason).

So I worry about separating nesting sleep in loops, which are mostly
harmless vs the non-loop case which is really bad.

FWIW the bug that started all this was someone calling blocking
functions from io_schedule():

io_schedule() -> blk_flush_plug() -> block-layer-magic ->
device-magic() -> mutex_lock().

And mutex_lock(() used:

if (need_resched())
schedule();

for preemption and would never return (because !TASK_RUNNING).

Now, we've since fixed the mutex code to not assume TASK_RUNNING, so we
won't actually trigger the reported 'deadlock' anymore.

In any case, I can certainly make the warning go away for now and try
again later with a (hopefully) more intelligent version.

On a related note, would it be possible to make sparse/coccinelle try
and warn on broken wait primitives? ie, no loop around schedule(), no
conditional after set_current_state()?

Examples:

drivers/atm/atmtcp.c-   while (test_bit(flag,>flags) == old_test) {
drivers/atm/atmtcp.c-   mb();
drivers/atm/atmtcp.c-   out_vcc = PRIV(vcc->dev) ? PRIV(vcc->dev)->vcc 
: NULL;
drivers/atm/atmtcp.c-   if (!out_vcc) {
drivers/atm/atmtcp.c-   error = -EUNATCH;
drivers/atm/atmtcp.c-   break;
drivers/atm/atmtcp.c-   }

set_bit(flag, >flags);
wake_up_process(foo);

drivers/atm/atmtcp.c-   set_current_state(TASK_UNINTERRUPTIBLE);
drivers/atm/atmtcp.c:   schedule();
drivers/atm/atmtcp.c-   }

Would not ever wake again it seems.

Re: Linux 3.19-rc5

2015-02-02 Thread Zdenek Kabelac


Dne 30.1.2015 v 02:49 Rafael J. Wysocki napsal(a):

On Thursday, January 29, 2015 05:12:11 PM Linus Torvalds wrote:

On Thu, Jan 29, 2015 at 5:22 PM, Rafael J. Wysocki  wrote:

On Wednesday, January 21, 2015 05:54:00 PM Peter Hurley wrote:


Yeah, but the debug check is triggering worse behavior, requiring
bisecting back to the debug commit.


Yes, it is.

So I'm wondering is anyone is working on fixing this in any way?

It kind of sucks when this is happening on an otherwise perfectly usable
old(ish) machine ...


The WARN() was already changed to a WARN_ONCE().

So that debug check doesn't cause problems any more. If somebody is
bisecting something else, and the WARN() is a problem for those
intermediate kernels, then just disabling CONFIG_DEBUG_ATOMIC_SLEEP
should get you past that point.

IOW, this really shouldn't be an issue.

Does the pccard thing still not work?


Interestingly enough, if the kernel is built with CONFIG_DEBUG_ATOMIC_SLEEP
unset, the problem with 99+% CPU load from pccardd goes away, so thanks for
the hint.



Ok - I  can now 'use'  3.19-rc7 on T61 (C2D) without having a single core 
occupied on 100% with pccardd, but I still get this one logged:



[2.185320] pcmcia_socket pcmcia_socket0: cs: memory probe 0x0c-0x0f:
[2.185363]  excluding 0xc-0xc 0xe-0xf
[2.185526] pcmcia_socket pcmcia_socket0: cs: memory probe 
0xa000-0xa0ff:

[2.185559]  excluding 0xa000-0xa0ff
[2.185720] pcmcia_socket pcmcia_socket0: cs: memory probe 
0x6000-0x60ff:

[2.185754]  excluding 0x6000-0x60ff
[2.297692] [ cut here ]
[2.297698] WARNING: CPU: 1 PID: 185 at kernel/sched/core.c:7300 
__might_sleep+0xae/0xc0()
[2.297702] do not call blocking ops when !TASK_RUNNING; state=1 set at 
[] pccardd+0xe8/0x490
[2.297712] Modules linked in: uhci_hcd sr_mod cdrom ehci_pci ehci_hcd 
yenta_socket usbcore usb_common video backlight autofs4
[2.297715] CPU: 1 PID: 185 Comm: pccardd Not tainted 
3.19.0-rc7-2-g9afdf96 #5
[2.297716] Hardware name: LENOVO 6464CTO/6464CTO, BIOS 7LETC9WW (2.29 ) 
03/18/2011
[2.297719]  819e4d8a 8800b8bdbc58 8163926d 
810a9e1e
[2.297721]  8800b8bdbca8 8800b8bdbc98 810587ba 
8800
[2.297724]  819e6073 026d  
0080

[2.297725] Call Trace:
[2.297729]  [] dump_stack+0x4f/0x7b
[2.297732]  [] ? down_trylock+0x2e/0x40
[2.297736]  [] warn_slowpath_common+0x8a/0xc0
[2.297738]  [] warn_slowpath_fmt+0x46/0x50
[2.297741]  [] ? pccardd+0xe8/0x490
[2.297742]  [] ? pccardd+0xe8/0x490
[2.297744]  [] __might_sleep+0xae/0xc0
[2.297747]  [] mutex_lock_nested+0x2e/0x440
[2.297749]  [] ? _raw_spin_unlock_irqrestore+0x5d/0x80
[2.297753]  [] ? preempt_count_sub+0xab/0x100
[2.297755]  [] pccardd+0x150/0x490
[2.297757]  [] ? pcmcia_socket_dev_complete+0x40/0x40
[2.297760]  [] kthread+0x11f/0x140
[2.297763]  [] ? kthread_create_on_node+0x260/0x260
[2.297766]  [] ret_from_fork+0x7c/0xb0
[2.297768]  [] ? kthread_create_on_node+0x260/0x260
[2.297770] ---[ end trace ed9e591061d223e6 ]---
[2.464635] usb 3-1: new full-speed USB device number 2 using uhci_hcd




and of course number of these:

[   29.660708] i915 :00:02.0: fb0: inteldrmfb frame buffer device
[   29.667061] i915 :00:02.0: registered panic notifier
[   30.314206] [ cut here ]
[   30.318866] WARNING: CPU: 0 PID: 1010 at drivers/gpu/drm/drm_irq.c:1121 
drm_wait_one_vblank+0x180/0x190 [drm]()

[   30.329085] vblank not available on crtc 1, ret=-22
[   30.334003] Modules linked in: i915 i2c_algo_bit drm_kms_helper drm 
xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4
 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack 
nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp tun brid
ge stp llc ipv6 ebtables ip6_tables iptable_filter ip_tables x_tables bnep 
iTCO_wdt hid_generic iTCO_vendor_support snd_hda_codec_analo
g snd_hda_codec_generic coretemp kvm_intel kvm microcode usbhid psmouse 
serio_raw hid arc4 r852 sm_common nand i2c_i801 nand_ecc i2c_co
re nand_ids mtd btusb bluetooth iwl3945 sdhci_pci r592 iwlegacy memstick 
pcmcia snd_hda_intel sdhci mac80211 snd_hda_controller mmc_cor
e snd_hda_codec snd_hwdep lpc_ich snd_seq mfd_core snd_seq_device snd_pcm 
e1000e cfg80211 ptp thinkpad_acpi snd_timer pps_core wmi nvra

m
[   30.406592]  snd soundcore evdev nfsd auth_rpcgss oid_registry nfs_acl 
lockd grace binfmt_misc loop sunrpc uhci_hcd sr_mod cdrom ehc

i_pci ehci_hcd yenta_socket usbcore usb_common video backlight autofs4
[   30.424031] CPU: 1 PID: 1010 Comm: Xorg.bin Tainted: GW 
3.19.0-rc7-2-g9afdf96 #5
[   30.432930] Hardware name: LENOVO 6464CTO/6464CTO, BIOS 7LETC9WW (2.29 ) 
03/18/2011
[   30.440627]  a0e1017f 880135a37a28 8163926d

Re: Linux 3.19-rc5

2015-02-02 Thread Peter Zijlstra

On Sat, Jan 31, 2015 at 01:54:36PM -0800, Linus Torvalds wrote:
 On Sat, Jan 31, 2015 at 12:16 PM, Peter Zijlstra pet...@infradead.org wrote:
 
  All the stuff it flagged are genuinely wrong, albeit not disastrously
  so, things mostly just work.
 
 I really disagree.
 
 They weren't wrong. They *could* occasionally result in extra
 reschedules, but that was never wrong to begin with.
 
 But the debugging code made that much much worse, and made that could
 result in extra reschedules happen all the time if it triggered.

Yes, that was a bad choice. Again, sorry about that.

 So you say genuinely wrong, and I say but that's how things were
 designed - it's an optimistic approach, not an exact one. Your
 debugging code changed that behavior, and actually introduced a real
 bug, exactly because you felt that the no nested sleeps was a harder
 requirement than it has ever actually been.
 
 In other words, I think the debugging code itself is wrong, and then
 that sched_annotate_sleep() thing is just a symptom of how it is
 wrong. If you have to sprinkle these kinds of random workarounds in a
 few core scheduler places (ok, mainly wait() paths, it looks like),
 why would you expect random *drivers* to have to care about things
 that even core kernel code says I'm not going to care about this,
 I'll just shut the warning up, because the warning is wrong.

I was not aware this pattern was so wide spread / accepted. I have
totally underestimated the amount of fallout here.

 So don't get me wrong - I think the whole add debug code to find
 places where we might have issues was well worth it, and resulted in
 improvements.
 
 But once the low-hanging fruit and the core code that everybody hits
 has been fixed, and people cannot apparently even be bothered with the
 other cases it finds (like the pccardd case), at that point the value
 of the debug code becomes rather less obvious.

My bad (again) for not paying proper attention there. No excuses for
that.

 The pccardd example is an example of legacy use of our old and
 original semantics of how the whole nested sleep was supposed to work.
 And it *does* work. It's not a bug. It's how things have worked time
 immemorial, and clearly nobody is really willing to bother with
 changing working - but legacy - cardbus code.  And at that point, I
 think the debug code is actually *wrong*, and causes more problems
 than it fixes.
 
 And debug code that causes more problems that it fixes should either
 be removed, or improved to the point where the problems go away.
 
 The improved part might be about saying it's actually perfectly
 _fine_ to have nested sleeps, as long as it is truly rare that the
 nested sleep actually sleeps. And thus make the debug code really
 test that it's *rare*, rather than test that it never happens. Warn if
 it happens more than a couple of times a second for any particular
 process, or something like that.

So the thing that makes it work is the fact that schedule is called in
a loop, without that loop things come apart very quickly.

Now most core primitives have this loop, and are safe from spurious
wakeups -- and I think we can class this under that. But there's heaps
of legacy code that has non looping calls of schedule and really breaks
with spurious wakeups (I ran into that a few years ago when I made
schedule() return 'randomly' for some other reason).

So I worry about separating nesting sleep in loops, which are mostly
harmless vs the non-loop case which is really bad.

FWIW the bug that started all this was someone calling blocking
functions from io_schedule():

io_schedule() - blk_flush_plug() - block-layer-magic -
device-magic() - mutex_lock().

And mutex_lock(() used:

if (need_resched())
schedule();

for preemption and would never return (because !TASK_RUNNING).

Now, we've since fixed the mutex code to not assume TASK_RUNNING, so we
won't actually trigger the reported 'deadlock' anymore.

In any case, I can certainly make the warning go away for now and try
again later with a (hopefully) more intelligent version.




On a related note, would it be possible to make sparse/coccinelle try
and warn on broken wait primitives? ie, no loop around schedule(), no
conditional after set_current_state()?

Examples:

drivers/atm/atmtcp.c-   while (test_bit(flag,vcc-flags) == old_test) {
drivers/atm/atmtcp.c-   mb();
drivers/atm/atmtcp.c-   out_vcc = PRIV(vcc-dev) ? PRIV(vcc-dev)-vcc 
: NULL;
drivers/atm/atmtcp.c-   if (!out_vcc) {
drivers/atm/atmtcp.c-   error = -EUNATCH;
drivers/atm/atmtcp.c-   break;
drivers/atm/atmtcp.c-   }

set_bit(flag, vcc-flags);
wake_up_process(foo);

drivers/atm/atmtcp.c-   set_current_state(TASK_UNINTERRUPTIBLE);
drivers/atm/atmtcp.c:   schedule();
drivers/atm/atmtcp.c-   }

Would not ever wake again it seems.



And I'm not entire sure what this is supposed to do,

Re: Linux 3.19-rc5

2015-02-02 Thread Zdenek Kabelac


Dne 30.1.2015 v 02:49 Rafael J. Wysocki napsal(a):

On Thursday, January 29, 2015 05:12:11 PM Linus Torvalds wrote:

On Thu, Jan 29, 2015 at 5:22 PM, Rafael J. Wysocki r...@rjwysocki.net wrote:

On Wednesday, January 21, 2015 05:54:00 PM Peter Hurley wrote:


Yeah, but the debug check is triggering worse behavior, requiring
bisecting back to the debug commit.


Yes, it is.

So I'm wondering is anyone is working on fixing this in any way?

It kind of sucks when this is happening on an otherwise perfectly usable
old(ish) machine ...


The WARN() was already changed to a WARN_ONCE().

So that debug check doesn't cause problems any more. If somebody is
bisecting something else, and the WARN() is a problem for those
intermediate kernels, then just disabling CONFIG_DEBUG_ATOMIC_SLEEP
should get you past that point.

IOW, this really shouldn't be an issue.

Does the pccard thing still not work?


Interestingly enough, if the kernel is built with CONFIG_DEBUG_ATOMIC_SLEEP
unset, the problem with 99+% CPU load from pccardd goes away, so thanks for
the hint.



Ok - I  can now 'use'  3.19-rc7 on T61 (C2D) without having a single core 
occupied on 100% with pccardd, but I still get this one logged:



[2.185320] pcmcia_socket pcmcia_socket0: cs: memory probe 0x0c-0x0f:
[2.185363]  excluding 0xc-0xc 0xe-0xf
[2.185526] pcmcia_socket pcmcia_socket0: cs: memory probe 
0xa000-0xa0ff:

[2.185559]  excluding 0xa000-0xa0ff
[2.185720] pcmcia_socket pcmcia_socket0: cs: memory probe 
0x6000-0x60ff:

[2.185754]  excluding 0x6000-0x60ff
[2.297692] [ cut here ]
[2.297698] WARNING: CPU: 1 PID: 185 at kernel/sched/core.c:7300 
__might_sleep+0xae/0xc0()
[2.297702] do not call blocking ops when !TASK_RUNNING; state=1 set at 
[81518e18] pccardd+0xe8/0x490
[2.297712] Modules linked in: uhci_hcd sr_mod cdrom ehci_pci ehci_hcd 
yenta_socket usbcore usb_common video backlight autofs4
[2.297715] CPU: 1 PID: 185 Comm: pccardd Not tainted 
3.19.0-rc7-2-g9afdf96 #5
[2.297716] Hardware name: LENOVO 6464CTO/6464CTO, BIOS 7LETC9WW (2.29 ) 
03/18/2011
[2.297719]  819e4d8a 8800b8bdbc58 8163926d 
810a9e1e
[2.297721]  8800b8bdbca8 8800b8bdbc98 810587ba 
8800
[2.297724]  819e6073 026d  
0080

[2.297725] Call Trace:
[2.297729]  [8163926d] dump_stack+0x4f/0x7b
[2.297732]  [810a9e1e] ? down_trylock+0x2e/0x40
[2.297736]  [810587ba] warn_slowpath_common+0x8a/0xc0
[2.297738]  [81058836] warn_slowpath_fmt+0x46/0x50
[2.297741]  [81518e18] ? pccardd+0xe8/0x490
[2.297742]  [81518e18] ? pccardd+0xe8/0x490
[2.297744]  [8108412e] __might_sleep+0xae/0xc0
[2.297747]  [8163cd7e] mutex_lock_nested+0x2e/0x440
[2.297749]  [8164189d] ? _raw_spin_unlock_irqrestore+0x5d/0x80
[2.297753]  [8108b42b] ? preempt_count_sub+0xab/0x100
[2.297755]  [81518e80] pccardd+0x150/0x490
[2.297757]  [81518d30] ? pcmcia_socket_dev_complete+0x40/0x40
[2.297760]  [8107db1f] kthread+0x11f/0x140
[2.297763]  [8107da00] ? kthread_create_on_node+0x260/0x260
[2.297766]  [8164242c] ret_from_fork+0x7c/0xb0
[2.297768]  [8107da00] ? kthread_create_on_node+0x260/0x260
[2.297770] ---[ end trace ed9e591061d223e6 ]---
[2.464635] usb 3-1: new full-speed USB device number 2 using uhci_hcd




and of course number of these:

[   29.660708] i915 :00:02.0: fb0: inteldrmfb frame buffer device
[   29.667061] i915 :00:02.0: registered panic notifier
[   30.314206] [ cut here ]
[   30.318866] WARNING: CPU: 0 PID: 1010 at drivers/gpu/drm/drm_irq.c:1121 
drm_wait_one_vblank+0x180/0x190 [drm]()

[   30.329085] vblank not available on crtc 1, ret=-22
[   30.334003] Modules linked in: i915 i2c_algo_bit drm_kms_helper drm 
xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4
 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack 
nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp tun brid
ge stp llc ipv6 ebtables ip6_tables iptable_filter ip_tables x_tables bnep 
iTCO_wdt hid_generic iTCO_vendor_support snd_hda_codec_analo
g snd_hda_codec_generic coretemp kvm_intel kvm microcode usbhid psmouse 
serio_raw hid arc4 r852 sm_common nand i2c_i801 nand_ecc i2c_co
re nand_ids mtd btusb bluetooth iwl3945 sdhci_pci r592 iwlegacy memstick 
pcmcia snd_hda_intel sdhci mac80211 snd_hda_controller mmc_cor
e snd_hda_codec snd_hwdep lpc_ich snd_seq mfd_core snd_seq_device snd_pcm 
e1000e cfg80211 ptp thinkpad_acpi snd_timer pps_core wmi nvra

m
[   30.406592]  snd soundcore evdev nfsd auth_rpcgss oid_registry nfs_acl 
lockd grace binfmt_misc loop sunrpc uhci_hcd sr_mod cdrom ehc

i_pci ehci_hcd yenta_socket usbcore

Re: Linux 3.19-rc5

2015-02-02 Thread Peter Zijlstra

On Sun, Feb 01, 2015 at 12:09:32PM -0800, Linus Torvalds wrote:
 Now, I have the patch that removes that thing (but I was hoping to get
 it from the scheduler tree before doing rc7, which seems to not have
 happened), but yes, that together with your patch seems like it should
 fix all the nasty bug-inducing crud where the debugging helpers end
 up silently changing core process state.
 
 I'll just combine it with yours to avoid extra noise in this area, and
 mark you as the author, fixing *both* of the incorrect state changes.
 Ok?

Ah I see it in your tree; I was about to suggest:

-# define sched_annotate_sleep()__set_current_state(TASK_RUNNING)
+# define sched_annotate_sleep()do { current-task_state_change = 0; } 
while (0)

Instead of the assignment, which has a rvalue.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-02-01 Thread Oleg Nesterov

On 02/01, Linus Torvalds wrote:
>
> On Sun, Feb 1, 2015 at 11:43 AM, Oleg Nesterov  wrote:
> >
> > And personally I agree. sched_annotate_sleep() looks self-documented, it
> > is clear that it is used to suppress the warning.
>
> But *that's not the problem*.
>
> If it was just silencing the warning, things would be fine.
>
> But it is actively screwing task state up, and actually changing
> behavior of the kernel (in a very subtle part of the code too), and
> doing so in ways that potentially introduce WORSE BUGS THAN THE WHOLE
> DAMN DEBUG SUPPORT WAS SUPPOSED TO FIND IN THE FIRST PLACE.

I understand, that is why I suggested to change it.

> I like your patch, but I'm going to combine it with mine that actually
> fixes a real bug,

Sure, I agree.

> because what you don't see in that patch of yours:
> ...
>
>
> is that the whole "if (WARN_ONCE()" remains horribly buggy, because of
> the line that follows it:
>
> __set_current_state(TASK_RUNNING);
>
> in the if-statement.

Ah. I just forgot to mention that this change should be rediffed on top
of your patch, of course it is not enough.

> I'll just combine it with yours to avoid extra noise in this area, and
> mark you as the author, fixing *both* of the incorrect state changes.
> Ok?

Please combine them, but don't mark me as an author, I do not want to
take the false credits ;)

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-02-01 Thread Linus Torvalds

On Sun, Feb 1, 2015 at 11:43 AM, Oleg Nesterov  wrote:
>
> And personally I agree. sched_annotate_sleep() looks self-documented, it
> is clear that it is used to suppress the warning.

But *that's not the problem*.

If it was just silencing the warning, things would be fine.

But it is actively screwing task state up, and actually changing
behavior of the kernel (in a very subtle part of the code too), and
doing so in ways that potentially introduce WORSE BUGS THAN THE WHOLE
DAMN DEBUG SUPPORT WAS SUPPOSED TO FIND IN THE FIRST PLACE.

That's the thing that upsets me. This is debug code. It's not even
debugging things that are "buggy" - its' just finding things that can
be potentially slightly inefficient. And it already introduced a
subtle bug once.

(Ok, it wasn't *that* subtle in the end, but it needed both a good
bisection result and some reading of unrelated source code to figure
out, so it certainly wasn't really obvious).

> Still. Can't we avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP
> adds?

Now this patch I agree with 100% percent, apart from you calling it a
"subtle change". It's more than a "subtle change", this was adding a
real and present bug that took two weeks to get fixed!

(And by "took two weeks to get fixed" I mean "not actually fixed yet,
but now we at least know what was going on").

I like your patch, but I'm going to combine it with mine that actually
fixes a real bug, because what you don't see in that patch of yours:

> --- x/include/linux/kernel.h
> +++ x/include/linux/kernel.h
> @@ -176,7 +176,7 @@ extern int _cond_resched(void);
>   */
>  # define might_sleep() \
> do { __might_sleep(__FILE__, __LINE__, 0); might_resched(); } while 
> (0)
> -# define sched_annotate_sleep()__set_current_state(TASK_RUNNING)
> +# define sched_annotate_sleep()(current->task_state_change = 0)
>  #else
>static inline void ___might_sleep(const char *file, int line,
>int preempt_offset) { }
> --- x/kernel/sched/core.c
> +++ x/kernel/sched/core.c
> @@ -7292,7 +7292,7 @@ void __might_sleep(const char *file, int
>  * since we will exit with TASK_RUNNING make sure we enter with it,
>  * otherwise we will destroy state.
>  */
> -   if (WARN_ONCE(current->state != TASK_RUNNING,
> +   if (WARN_ONCE(current->state != TASK_RUNNING && 
> current->task_state_change,
> "do not call blocking ops when !TASK_RUNNING; "
> "state=%lx set at [<%p>] %pS\n",
> current->state,

is that the whole "if (WARN_ONCE()" remains horribly buggy, because of
the line that follows it:

__set_current_state(TASK_RUNNING);

in the if-statement.

Now, I have the patch that removes that thing (but I was hoping to get
it from the scheduler tree before doing rc7, which seems to not have
happened), but yes, that together with your patch seems like it should
fix all the nasty bug-inducing crud where the "debugging helpers" end
up silently changing core process state.

I'll just combine it with yours to avoid extra noise in this area, and
mark you as the author, fixing *both* of the incorrect state changes.
Ok?

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-02-01 Thread Oleg Nesterov

On 01/31, Peter Zijlstra wrote:
>
> On Sat, Jan 31, 2015 at 10:32:23AM -0800, Linus Torvalds wrote:
> > On Fri, Jan 30, 2015 at 7:47 AM, Oleg Nesterov  wrote:
> > >
> > > Perhaps sched_annotate_sleep() shouldn't depend on 
> > > CONFIG_DEBUG_ATOMIC_SLEEP
> > > too...
> >
> > Ugh. That thing is horrible. The naming doesn't make it obvious at all
> > that it's actually making sure that we have state set to TASK_RUNNING,
> > and I could easily imagine that it would cause similar "busy-loops
> > while scheduling" issues if anybody ever uses it in the wrong context.
>
> The alternative was putting unconditional __set_task_state(TASK_RUNNING)
> things in a few code paths. I didn't want to cause the extra code in
> case we didn't need them. Particularly the include/net/sock.h one, as I
> know the network people are cycle counters.

And personally I agree. sched_annotate_sleep() looks self-documented, it
is clear that it is used to suppress the warning.

Still. Can't we avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP
adds?

Oleg.


--- x/include/linux/kernel.h
+++ x/include/linux/kernel.h
@@ -176,7 +176,7 @@ extern int _cond_resched(void);
  */
 # define might_sleep() \
do { __might_sleep(__FILE__, __LINE__, 0); might_resched(); } while (0)
-# define sched_annotate_sleep()__set_current_state(TASK_RUNNING)
+# define sched_annotate_sleep()(current->task_state_change = 0)
 #else
   static inline void ___might_sleep(const char *file, int line,
   int preempt_offset) { }
--- x/kernel/sched/core.c
+++ x/kernel/sched/core.c
@@ -7292,7 +7292,7 @@ void __might_sleep(const char *file, int
 * since we will exit with TASK_RUNNING make sure we enter with it,
 * otherwise we will destroy state.
 */
-   if (WARN_ONCE(current->state != TASK_RUNNING,
+   if (WARN_ONCE(current->state != TASK_RUNNING && 
current->task_state_change,
"do not call blocking ops when !TASK_RUNNING; "
"state=%lx set at [<%p>] %pS\n",
current->state,

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-02-01 Thread Oleg Nesterov

On 01/31, Peter Zijlstra wrote:

 On Sat, Jan 31, 2015 at 10:32:23AM -0800, Linus Torvalds wrote:
  On Fri, Jan 30, 2015 at 7:47 AM, Oleg Nesterov o...@redhat.com wrote:
  
   Perhaps sched_annotate_sleep() shouldn't depend on 
   CONFIG_DEBUG_ATOMIC_SLEEP
   too...
 
  Ugh. That thing is horrible. The naming doesn't make it obvious at all
  that it's actually making sure that we have state set to TASK_RUNNING,
  and I could easily imagine that it would cause similar busy-loops
  while scheduling issues if anybody ever uses it in the wrong context.

 The alternative was putting unconditional __set_task_state(TASK_RUNNING)
 things in a few code paths. I didn't want to cause the extra code in
 case we didn't need them. Particularly the include/net/sock.h one, as I
 know the network people are cycle counters.

And personally I agree. sched_annotate_sleep() looks self-documented, it
is clear that it is used to suppress the warning.

Still. Can't we avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP
adds?

Oleg.


--- x/include/linux/kernel.h
+++ x/include/linux/kernel.h
@@ -176,7 +176,7 @@ extern int _cond_resched(void);
  */
 # define might_sleep() \
do { __might_sleep(__FILE__, __LINE__, 0); might_resched(); } while (0)
-# define sched_annotate_sleep()__set_current_state(TASK_RUNNING)
+# define sched_annotate_sleep()(current-task_state_change = 0)
 #else
   static inline void ___might_sleep(const char *file, int line,
   int preempt_offset) { }
--- x/kernel/sched/core.c
+++ x/kernel/sched/core.c
@@ -7292,7 +7292,7 @@ void __might_sleep(const char *file, int
 * since we will exit with TASK_RUNNING make sure we enter with it,
 * otherwise we will destroy state.
 */
-   if (WARN_ONCE(current-state != TASK_RUNNING,
+   if (WARN_ONCE(current-state != TASK_RUNNING  
current-task_state_change,
do not call blocking ops when !TASK_RUNNING; 
state=%lx set at [%p] %pS\n,
current-state,

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-02-01 Thread Oleg Nesterov

On 02/01, Linus Torvalds wrote:

 On Sun, Feb 1, 2015 at 11:43 AM, Oleg Nesterov o...@redhat.com wrote:
 
  And personally I agree. sched_annotate_sleep() looks self-documented, it
  is clear that it is used to suppress the warning.

 But *that's not the problem*.

 If it was just silencing the warning, things would be fine.

 But it is actively screwing task state up, and actually changing
 behavior of the kernel (in a very subtle part of the code too), and
 doing so in ways that potentially introduce WORSE BUGS THAN THE WHOLE
 DAMN DEBUG SUPPORT WAS SUPPOSED TO FIND IN THE FIRST PLACE.

I understand, that is why I suggested to change it.

 I like your patch, but I'm going to combine it with mine that actually
 fixes a real bug,

Sure, I agree.

 because what you don't see in that patch of yours:
 ...


 is that the whole if (WARN_ONCE() remains horribly buggy, because of
 the line that follows it:

 __set_current_state(TASK_RUNNING);

 in the if-statement.

Ah. I just forgot to mention that this change should be rediffed on top
of your patch, of course it is not enough.

 I'll just combine it with yours to avoid extra noise in this area, and
 mark you as the author, fixing *both* of the incorrect state changes.
 Ok?

Please combine them, but don't mark me as an author, I do not want to
take the false credits ;)

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-02-01 Thread Linus Torvalds

On Sun, Feb 1, 2015 at 11:43 AM, Oleg Nesterov o...@redhat.com wrote:

 And personally I agree. sched_annotate_sleep() looks self-documented, it
 is clear that it is used to suppress the warning.

But *that's not the problem*.

If it was just silencing the warning, things would be fine.

But it is actively screwing task state up, and actually changing
behavior of the kernel (in a very subtle part of the code too), and
doing so in ways that potentially introduce WORSE BUGS THAN THE WHOLE
DAMN DEBUG SUPPORT WAS SUPPOSED TO FIND IN THE FIRST PLACE.

That's the thing that upsets me. This is debug code. It's not even
debugging things that are buggy - its' just finding things that can
be potentially slightly inefficient. And it already introduced a
subtle bug once.

(Ok, it wasn't *that* subtle in the end, but it needed both a good
bisection result and some reading of unrelated source code to figure
out, so it certainly wasn't really obvious).

 Still. Can't we avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP
 adds?

Now this patch I agree with 100% percent, apart from you calling it a
subtle change. It's more than a subtle change, this was adding a
real and present bug that took two weeks to get fixed!

(And by took two weeks to get fixed I mean not actually fixed yet,
but now we at least know what was going on).

I like your patch, but I'm going to combine it with mine that actually
fixes a real bug, because what you don't see in that patch of yours:

 --- x/include/linux/kernel.h
 +++ x/include/linux/kernel.h
 @@ -176,7 +176,7 @@ extern int _cond_resched(void);
   */
  # define might_sleep() \
 do { __might_sleep(__FILE__, __LINE__, 0); might_resched(); } while 
 (0)
 -# define sched_annotate_sleep()__set_current_state(TASK_RUNNING)
 +# define sched_annotate_sleep()(current-task_state_change = 0)
  #else
static inline void ___might_sleep(const char *file, int line,
int preempt_offset) { }
 --- x/kernel/sched/core.c
 +++ x/kernel/sched/core.c
 @@ -7292,7 +7292,7 @@ void __might_sleep(const char *file, int
  * since we will exit with TASK_RUNNING make sure we enter with it,
  * otherwise we will destroy state.
  */
 -   if (WARN_ONCE(current-state != TASK_RUNNING,
 +   if (WARN_ONCE(current-state != TASK_RUNNING  
 current-task_state_change,
 do not call blocking ops when !TASK_RUNNING; 
 state=%lx set at [%p] %pS\n,
 current-state,

is that the whole if (WARN_ONCE() remains horribly buggy, because of
the line that follows it:

__set_current_state(TASK_RUNNING);

in the if-statement.

Now, I have the patch that removes that thing (but I was hoping to get
it from the scheduler tree before doing rc7, which seems to not have
happened), but yes, that together with your patch seems like it should
fix all the nasty bug-inducing crud where the debugging helpers end
up silently changing core process state.

I'll just combine it with yours to avoid extra noise in this area, and
mark you as the author, fixing *both* of the incorrect state changes.
Ok?

  Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-01-31 Thread Linus Torvalds

On Sat, Jan 31, 2015 at 12:16 PM, Peter Zijlstra  wrote:
>
> All the stuff it flagged are genuinely wrong, albeit not disastrously
> so, things mostly just work.

I really disagree.

They weren't wrong. They *could* occasionally result in extra
reschedules, but that was never wrong to begin with.

But the debugging code made that much much worse, and made that "could
result in extra reschedules" happen all the time if it triggered.

And the whole "set task to sleep early" thing was rather intentional,
and was a fundamental part of the original design for
"select()/poll()" behavior, for example. Yes, we ended up having the
waitqueue functions and using that "pollwake()" and use
"wq->triggered" etc instead, but the whole optimistic early sleep
thing worked reasonably well for a long time even when the "early
TASK_SLEEP" was done before calling *thousands* of random poll
routines.

So you say "genuinely wrong", and I say "but that's how things were
designed - it's an optimistic approach, not an exact one". Your
debugging code changed that behavior, and actually introduced a real
bug, exactly because you felt that the "no nested sleeps" was a harder
requirement than it has ever actually been.

In other words, I think the debugging code itself is wrong, and then
that sched_annotate_sleep() thing is just a symptom of how it is
wrong. If you have to sprinkle these kinds of random workarounds in a
few core scheduler places (ok, mainly "wait()" paths, it looks like),
why would you expect random *drivers* to have to care about things
that even core kernel code says "I'm not going to care about this,
I'll just shut the warning up, because the warning is wrong".

Yes, the fact that select/poll was changed to try to avoid excessive
polling scheduling because it actually *was* a problem under some
loads does say that we generally want to try to avoid nested sleeping.
Because while it is rare and the optimistic approach works fine in
most cases, it certainly *can* become a problem if the optimistic "I'm
normally not going to sleep" thing ends up not being sufficiently
accurate.

So don't get me wrong - I think the whole "add debug code to find
places where we might have issues" was well worth it, and resulted in
improvements.

But once the low-hanging fruit and the core code that everybody hits
has been fixed, and people cannot apparently even be bothered with the
other cases it finds (like the pccardd case), at that point the value
of the debug code becomes rather less obvious.

And the downsides become bigger.

The pccardd example is an example of legacy use of our old and
original semantics of how the whole nested sleep was supposed to work.
And it *does* work. It's not a bug. It's how things have worked time
immemorial, and clearly nobody is really willing to bother with
changing working - but legacy - cardbus code.  And at that point, I
think the debug code is actually *wrong*, and causes more problems
than it "fixes".

And debug code that causes more problems that it fixes should either
be removed, or improved to the point where the problems go away.

The "improved" part might be about saying "it's actually perfectly
_fine_ to have nested sleeps, as long as it is truly rare that the
nested sleep actually sleeps". And thus make the debug code really
test that it's *rare*, rather than test that it never happens. Warn if
it happens more than a couple of times a second for any particular
process, or something like that.

Hmm?

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-01-31 Thread Peter Zijlstra

On Sat, Jan 31, 2015 at 10:32:23AM -0800, Linus Torvalds wrote:
> On Fri, Jan 30, 2015 at 7:47 AM, Oleg Nesterov  wrote:
> >
> > Perhaps sched_annotate_sleep() shouldn't depend on CONFIG_DEBUG_ATOMIC_SLEEP
> > too...
> 
> Ugh. That thing is horrible. The naming doesn't make it obvious at all
> that it's actually making sure that we have state set to TASK_RUNNING,
> and I could easily imagine that it would cause similar "busy-loops
> while scheduling" issues if anybody ever uses it in the wrong context.

The alternative was putting unconditional __set_task_state(TASK_RUNNING)
things in a few code paths. I didn't want to cause the extra code in
case we didn't need them. Particularly the include/net/sock.h one, as I
know the network people are cycle counters.

But this function should indeed be used very rarely. Currently we're at
5 instances.

> So I really think that whole thing is a sign of "the debug
> infrastructure is buggy, and people are introducing fragile things to
> just shut up the false positives".

Aside from this one annotation, which is used in cases where we broke
out of the wait loop but haven't reset task state yet, there have not
really been false positives.

All the stuff it flagged are genuinely wrong, albeit not disastrously
so, things mostly just work.

Should I make the default wait_event() safe against sleeps? It would
make it slightly more expensive, but on the whole that should not really
be a problem.

Alternatively we should provide an alternative to wait_event() that
allows sleeps, but I'm failing to come up with a good name.

> I don't know how to fix it. I really get the feeling that the whole
> new "nested sleep" detection code was a mistake to begin with, since
> it wasn't even a real bug, and it has now created more bugs than it
> ever detected afaik.

I appreciate I caused you some pain here, and I'm very sorry for that.
But I do think we should be little more careful with task::state, on
occasion things do go wrong there and funny stuff happens.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-01-31 Thread Linus Torvalds

On Fri, Jan 30, 2015 at 7:47 AM, Oleg Nesterov  wrote:
>
> Perhaps sched_annotate_sleep() shouldn't depend on CONFIG_DEBUG_ATOMIC_SLEEP
> too...

Ugh. That thing is horrible. The naming doesn't make it obvious at all
that it's actually making sure that we have state set to TASK_RUNNING,
and I could easily imagine that it would cause similar "busy-loops
while scheduling" issues if anybody ever uses it in the wrong context.

So I really think that whole thing is a sign of "the debug
infrastructure is buggy, and people are introducing fragile things to
just shut up the false positives".

I don't know how to fix it. I really get the feeling that the whole
new "nested sleep" detection code was a mistake to begin with, since
it wasn't even a real bug, and it has now created more bugs than it
ever detected afaik.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-01-31 Thread Peter Zijlstra

On Thu, Jan 29, 2015 at 05:25:07PM -0800, Linus Torvalds wrote:
> PeterZ, please don't make "debugging" patches like this. Ever again.
> Because this was just stupid, and it took me too long to realize that
> despite the warning being shut up, the debug patch was still actively
> doing bad bad things.

Understood.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-01-31 Thread Bruno Prémont

On Thu, 29 Jan 2015 17:25:07 Linus Torvalds wrote:
> On Thu, Jan 29, 2015 at 5:12 PM, Linus Torvalds wrote:
> >
> > The WARN() was already changed to a WARN_ONCE().
> 
> Oh, but I notice that the "__set_current_state(TASK_RUNNING) ends up
> always happening.
> 
> So I think the right fix is to:
> 
>  - warn once like we do
> 
>  - but *not* do that __set_current_state() which was always total
> crap anyway
> 
> Why do I say "total crap"? Because of two independent issues:
> 
>  (a) it actually changes behavior for a debug vs non-debug kernel,
> which is a really bad idea to begin with
> 
>  (b) it's really wrong. The whole "nested sleep" case was never a
> major bug to begin with, just a possible inefficiency where constant
> nested sleeps would possibly make the outer sleep not sleep. But that
> "could possibly make" case was the unlikely case, and the debug patch
> made it happen *all* the time by explicitly setting things running.
> 
> So I think the proper patch is the attached.
> 
> The comment is also crap. The comment says
> 
> "Blocking primitives will set (and therefore destroy)
> current->state [...]"
> 
> but the reality is that they *may* set it, and only in the unlikely
> slow-path where they actually block.
> 
> So doing this in "__may_sleep()" is just bogus and horrible horrible
> crap. It turns the "harmless ugliness" into a real *harmful* bug. The
> key word of "__may_sleep()" is that "MAY" part. It's a debug thing to
> make relatively rare cases show up.
> 
> PeterZ, please don't make "debugging" patches like this. Ever again.
> Because this was just stupid, and it took me too long to realize that
> despite the warning being shut up, the debug patch was still actively
> doing bad bad things.
> 
> Ingo, maybe you'd want to apply this through the scheduler tree, the
> way you already did the WARN_ONCE() thing.
> 
> Bruno, does this finally actually fix your pccard thing?

I will report back on Wednesday when I'm back home from FOSDEM. I don't
have the affected machine at hand at the moment.

Thanks for looking into it!
Bruno

>   Linus

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-01-31 Thread Bruno Prémont

On Thu, 29 Jan 2015 17:25:07 Linus Torvalds wrote:
 On Thu, Jan 29, 2015 at 5:12 PM, Linus Torvalds wrote:
 
  The WARN() was already changed to a WARN_ONCE().
 
 Oh, but I notice that the __set_current_state(TASK_RUNNING) ends up
 always happening.
 
 So I think the right fix is to:
 
  - warn once like we do
 
  - but *not* do that __set_current_state() which was always total
 crap anyway
 
 Why do I say total crap? Because of two independent issues:
 
  (a) it actually changes behavior for a debug vs non-debug kernel,
 which is a really bad idea to begin with
 
  (b) it's really wrong. The whole nested sleep case was never a
 major bug to begin with, just a possible inefficiency where constant
 nested sleeps would possibly make the outer sleep not sleep. But that
 could possibly make case was the unlikely case, and the debug patch
 made it happen *all* the time by explicitly setting things running.
 
 So I think the proper patch is the attached.
 
 The comment is also crap. The comment says
 
 Blocking primitives will set (and therefore destroy)
 current-state [...]
 
 but the reality is that they *may* set it, and only in the unlikely
 slow-path where they actually block.
 
 So doing this in __may_sleep() is just bogus and horrible horrible
 crap. It turns the harmless ugliness into a real *harmful* bug. The
 key word of __may_sleep() is that MAY part. It's a debug thing to
 make relatively rare cases show up.
 
 PeterZ, please don't make debugging patches like this. Ever again.
 Because this was just stupid, and it took me too long to realize that
 despite the warning being shut up, the debug patch was still actively
 doing bad bad things.
 
 Ingo, maybe you'd want to apply this through the scheduler tree, the
 way you already did the WARN_ONCE() thing.
 
 Bruno, does this finally actually fix your pccard thing?

I will report back on Wednesday when I'm back home from FOSDEM. I don't
have the affected machine at hand at the moment.

Thanks for looking into it!
Bruno

   Linus

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-01-31 Thread Linus Torvalds

On Fri, Jan 30, 2015 at 7:47 AM, Oleg Nesterov o...@redhat.com wrote:

 Perhaps sched_annotate_sleep() shouldn't depend on CONFIG_DEBUG_ATOMIC_SLEEP
 too...

Ugh. That thing is horrible. The naming doesn't make it obvious at all
that it's actually making sure that we have state set to TASK_RUNNING,
and I could easily imagine that it would cause similar busy-loops
while scheduling issues if anybody ever uses it in the wrong context.

So I really think that whole thing is a sign of the debug
infrastructure is buggy, and people are introducing fragile things to
just shut up the false positives.

I don't know how to fix it. I really get the feeling that the whole
new nested sleep detection code was a mistake to begin with, since
it wasn't even a real bug, and it has now created more bugs than it
ever detected afaik.

  Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-01-31 Thread Peter Zijlstra

On Sat, Jan 31, 2015 at 10:32:23AM -0800, Linus Torvalds wrote:
 On Fri, Jan 30, 2015 at 7:47 AM, Oleg Nesterov o...@redhat.com wrote:
 
  Perhaps sched_annotate_sleep() shouldn't depend on CONFIG_DEBUG_ATOMIC_SLEEP
  too...
 
 Ugh. That thing is horrible. The naming doesn't make it obvious at all
 that it's actually making sure that we have state set to TASK_RUNNING,
 and I could easily imagine that it would cause similar busy-loops
 while scheduling issues if anybody ever uses it in the wrong context.

The alternative was putting unconditional __set_task_state(TASK_RUNNING)
things in a few code paths. I didn't want to cause the extra code in
case we didn't need them. Particularly the include/net/sock.h one, as I
know the network people are cycle counters.

But this function should indeed be used very rarely. Currently we're at
5 instances.

 So I really think that whole thing is a sign of the debug
 infrastructure is buggy, and people are introducing fragile things to
 just shut up the false positives.

Aside from this one annotation, which is used in cases where we broke
out of the wait loop but haven't reset task state yet, there have not
really been false positives.

All the stuff it flagged are genuinely wrong, albeit not disastrously
so, things mostly just work.

Should I make the default wait_event() safe against sleeps? It would
make it slightly more expensive, but on the whole that should not really
be a problem.

Alternatively we should provide an alternative to wait_event() that
allows sleeps, but I'm failing to come up with a good name.

 I don't know how to fix it. I really get the feeling that the whole
 new nested sleep detection code was a mistake to begin with, since
 it wasn't even a real bug, and it has now created more bugs than it
 ever detected afaik.

I appreciate I caused you some pain here, and I'm very sorry for that.
But I do think we should be little more careful with task::state, on
occasion things do go wrong there and funny stuff happens.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-01-31 Thread Linus Torvalds

On Sat, Jan 31, 2015 at 12:16 PM, Peter Zijlstra pet...@infradead.org wrote:

 All the stuff it flagged are genuinely wrong, albeit not disastrously
 so, things mostly just work.

I really disagree.

They weren't wrong. They *could* occasionally result in extra
reschedules, but that was never wrong to begin with.

But the debugging code made that much much worse, and made that could
result in extra reschedules happen all the time if it triggered.

And the whole set task to sleep early thing was rather intentional,
and was a fundamental part of the original design for
select()/poll() behavior, for example. Yes, we ended up having the
waitqueue functions and using that pollwake() and use
wq-triggered etc instead, but the whole optimistic early sleep
thing worked reasonably well for a long time even when the early
TASK_SLEEP was done before calling *thousands* of random poll
routines.

So you say genuinely wrong, and I say but that's how things were
designed - it's an optimistic approach, not an exact one. Your
debugging code changed that behavior, and actually introduced a real
bug, exactly because you felt that the no nested sleeps was a harder
requirement than it has ever actually been.

In other words, I think the debugging code itself is wrong, and then
that sched_annotate_sleep() thing is just a symptom of how it is
wrong. If you have to sprinkle these kinds of random workarounds in a
few core scheduler places (ok, mainly wait() paths, it looks like),
why would you expect random *drivers* to have to care about things
that even core kernel code says I'm not going to care about this,
I'll just shut the warning up, because the warning is wrong.

Yes, the fact that select/poll was changed to try to avoid excessive
polling scheduling because it actually *was* a problem under some
loads does say that we generally want to try to avoid nested sleeping.
Because while it is rare and the optimistic approach works fine in
most cases, it certainly *can* become a problem if the optimistic I'm
normally not going to sleep thing ends up not being sufficiently
accurate.

So don't get me wrong - I think the whole add debug code to find
places where we might have issues was well worth it, and resulted in
improvements.

But once the low-hanging fruit and the core code that everybody hits
has been fixed, and people cannot apparently even be bothered with the
other cases it finds (like the pccardd case), at that point the value
of the debug code becomes rather less obvious.

And the downsides become bigger.

The pccardd example is an example of legacy use of our old and
original semantics of how the whole nested sleep was supposed to work.
And it *does* work. It's not a bug. It's how things have worked time
immemorial, and clearly nobody is really willing to bother with
changing working - but legacy - cardbus code.  And at that point, I
think the debug code is actually *wrong*, and causes more problems
than it fixes.

And debug code that causes more problems that it fixes should either
be removed, or improved to the point where the problems go away.

The improved part might be about saying it's actually perfectly
_fine_ to have nested sleeps, as long as it is truly rare that the
nested sleep actually sleeps. And thus make the debug code really
test that it's *rare*, rather than test that it never happens. Warn if
it happens more than a couple of times a second for any particular
process, or something like that.

Hmm?

  Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-01-31 Thread Peter Zijlstra

On Thu, Jan 29, 2015 at 05:25:07PM -0800, Linus Torvalds wrote:
 PeterZ, please don't make debugging patches like this. Ever again.
 Because this was just stupid, and it took me too long to realize that
 despite the warning being shut up, the debug patch was still actively
 doing bad bad things.

Understood.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-01-30 Thread Oleg Nesterov

On 01/29, Linus Torvalds wrote:
>
>  (a) it actually changes behavior for a debug vs non-debug kernel,
> which is a really bad idea to begin with

Perhaps sched_annotate_sleep() shouldn't depend on CONFIG_DEBUG_ATOMIC_SLEEP
too...

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-01-30 Thread Oleg Nesterov

On 01/29, Linus Torvalds wrote:

  (a) it actually changes behavior for a debug vs non-debug kernel,
 which is a really bad idea to begin with

Perhaps sched_annotate_sleep() shouldn't depend on CONFIG_DEBUG_ATOMIC_SLEEP
too...

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-01-29 Thread Rafael J. Wysocki

On Thursday, January 29, 2015 05:25:07 PM Linus Torvalds wrote:
> On Thu, Jan 29, 2015 at 5:12 PM, Linus Torvalds
>  wrote:
> >
> > The WARN() was already changed to a WARN_ONCE().
> 
> Oh, but I notice that the "__set_current_state(TASK_RUNNING) ends up
> always happening.
> 
> So I think the right fix is to:
> 
>  - warn once like we do
> 
>  - but *not* do that __set_current_state() which was always total crap anyway
> 
> Why do I say "total crap"? Because of two independent issues:
> 
>  (a) it actually changes behavior for a debug vs non-debug kernel,
> which is a really bad idea to begin with
> 
>  (b) it's really wrong. The whole "nested sleep" case was never a
> major bug to begin with, just a possible inefficiency where constant
> nested sleeps would possibly make the outer sleep not sleep. But that
> "could possibly make" case was the unlikely case, and the debug patch
> made it happen *all* the time by explicitly setting things running.
> 
> So I think the proper patch is the attached.

I applied the patch and compiled the kernel with CONFIG_DEBUG_ATOMIC_SLEEP
set again.  The warning is back, so the DEBUG_ATOMIC_SLEEP thing appears to
be working (and I don't really care about the freaking warning anyway), but the
pccardd load issue is gone, so the patch fixes the problem for me.  Thanks!

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-01-29 Thread Linus Torvalds

On Thu, Jan 29, 2015 at 5:25 PM, Linus Torvalds
 wrote:
>
> Ingo, maybe you'd want to apply this through the scheduler tree, the
> way you already did the WARN_ONCE() thing.

Side note: I can obviously just commit it myself, but for things that
have obvious maintainers I tend to try to try to push the changes
through channels.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-01-29 Thread Rafael J. Wysocki

On Thursday, January 29, 2015 05:12:11 PM Linus Torvalds wrote:
> On Thu, Jan 29, 2015 at 5:22 PM, Rafael J. Wysocki  wrote:
> > On Wednesday, January 21, 2015 05:54:00 PM Peter Hurley wrote:
> >>
> >> Yeah, but the debug check is triggering worse behavior, requiring
> >> bisecting back to the debug commit.
> >
> > Yes, it is.
> >
> > So I'm wondering is anyone is working on fixing this in any way?
> >
> > It kind of sucks when this is happening on an otherwise perfectly usable
> > old(ish) machine ...
> 
> The WARN() was already changed to a WARN_ONCE().
> 
> So that debug check doesn't cause problems any more. If somebody is
> bisecting something else, and the WARN() is a problem for those
> intermediate kernels, then just disabling CONFIG_DEBUG_ATOMIC_SLEEP
> should get you past that point.
> 
> IOW, this really shouldn't be an issue.
> 
> Does the pccard thing still not work?

Interestingly enough, if the kernel is built with CONFIG_DEBUG_ATOMIC_SLEEP
unset, the problem with 99+% CPU load from pccardd goes away, so thanks for
the hint.

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-01-29 Thread Linus Torvalds

On Thu, Jan 29, 2015 at 5:12 PM, Linus Torvalds
 wrote:
>
> The WARN() was already changed to a WARN_ONCE().

Oh, but I notice that the "__set_current_state(TASK_RUNNING) ends up
always happening.

So I think the right fix is to:

 - warn once like we do

 - but *not* do that __set_current_state() which was always total crap anyway

Why do I say "total crap"? Because of two independent issues:

 (a) it actually changes behavior for a debug vs non-debug kernel,
which is a really bad idea to begin with

 (b) it's really wrong. The whole "nested sleep" case was never a
major bug to begin with, just a possible inefficiency where constant
nested sleeps would possibly make the outer sleep not sleep. But that
"could possibly make" case was the unlikely case, and the debug patch
made it happen *all* the time by explicitly setting things running.

So I think the proper patch is the attached.

The comment is also crap. The comment says

"Blocking primitives will set (and therefore destroy) current->state [...]"

but the reality is that they *may* set it, and only in the unlikely
slow-path where they actually block.

So doing this in "__may_sleep()" is just bogus and horrible horrible
crap. It turns the "harmless ugliness" into a real *harmful* bug. The
key word of "__may_sleep()" is that "MAY" part. It's a debug thing to
make relatively rare cases show up.

PeterZ, please don't make "debugging" patches like this. Ever again.
Because this was just stupid, and it took me too long to realize that
despite the warning being shut up, the debug patch was still actively
doing bad bad things.

Ingo, maybe you'd want to apply this through the scheduler tree, the
way you already did the WARN_ONCE() thing.

Bruno, does this finally actually fix your pccard thing?

  Linus
 kernel/sched/core.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c0accc00566e..76aaf5c61114 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7292,13 +7292,12 @@ void __might_sleep(const char *file, int line, int 
preempt_offset)
 * since we will exit with TASK_RUNNING make sure we enter with it,
 * otherwise we will destroy state.
 */
-   if (WARN_ONCE(current->state != TASK_RUNNING,
+   WARN_ONCE(current->state != TASK_RUNNING,
"do not call blocking ops when !TASK_RUNNING; "
"state=%lx set at [<%p>] %pS\n",
current->state,
(void *)current->task_state_change,
-   (void *)current->task_state_change))
-   __set_current_state(TASK_RUNNING);
+   (void *)current->task_state_change);

___might_sleep(file, line, preempt_offset);
 }

Re: Linux 3.19-rc5

2015-01-29 Thread Linus Torvalds

On Thu, Jan 29, 2015 at 5:22 PM, Rafael J. Wysocki  wrote:
> On Wednesday, January 21, 2015 05:54:00 PM Peter Hurley wrote:
>>
>> Yeah, but the debug check is triggering worse behavior, requiring
>> bisecting back to the debug commit.
>
> Yes, it is.
>
> So I'm wondering is anyone is working on fixing this in any way?
>
> It kind of sucks when this is happening on an otherwise perfectly usable
> old(ish) machine ...

The WARN() was already changed to a WARN_ONCE().

So that debug check doesn't cause problems any more. If somebody is
bisecting something else, and the WARN() is a problem for those
intermediate kernels, then just disabling CONFIG_DEBUG_ATOMIC_SLEEP
should get you past that point.

IOW, this really shouldn't be an issue.

Does the pccard thing still not work?

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-01-29 Thread Rafael J. Wysocki

On Wednesday, January 21, 2015 05:54:00 PM Peter Hurley wrote:
> On 01/21/2015 05:12 PM, Davidlohr Bueso wrote:
> > On Wed, 2015-01-21 at 22:37 +0100, Bruno Prémont wrote:
> >> On Wed, 21 January 2015 Bruno Prémont wrote:
> >>> On Tue, 20 January 2015 Linus Torvalds wrote:
>  On Tue, Jan 20, 2015 at 6:02 AM, Bruno Prémont wrote:
> >
> > No idea yet which rc is the offender (nor exact patch), but on my not
> > so recent UP laptop with a pccard slot I have 2 pccardd kernel threads
> > converting my laptop into a heater.
> 
> [...]
> 
> >> Bisecting to the end did point me at (the warning traces produced in great
> >> quantities might not be the very same issue as the abusive CPU usage, but
> >> certainly look very related):
> >>   [CCing people on CC for the patch]
> >>
> >> commit 8eb23b9f35aae413140d3fda766a98092c21e9b0
> >> Author: Peter Zijlstra 
> >> Date:   Wed Sep 24 10:18:55 2014 +0200
> 
> [...]
> 
> >> Which does produce the following trace (hand-copied most important parts 
> >> of it):
> >>   Warning: CPU 0 PID: 68 at kernel/sched/core.c:7311 
> >> __might_sleep+0x143/0x170
> >>   do not call blocking ops when !TASK_RUNNING; state=1 set at [] 
> >> pccardd+0xa0/0x3e0
> >>   ...
> >>   Call trace:
> >> ...
> >> __might_sleep+0x143/0x170
> >> ? pccardd+0xa0/0x3e0
> >> ? pccardd+0xa0/0x3e0
> >> mutex_lock+0x17/0x2a
> >> pccardd+0xe9/0x3e0
> >> ? pcmcia_socket_uevent+0x30/0x30
> >>
> >> pccardd() is located in drivers/pcmcia/cs.c and seems to be of the 
> >> structure
> >> Peter's patch wants to warn about.
> > 
> > Yeah setting current to interruptable so early in the game is bogus. It
> > should be set after unlocking the skt_mutex.
> 
> Yeah, but the debug check is triggering worse behavior, requiring
> bisecting back to the debug commit.

Yes, it is.

So I'm wondering is anyone is working on fixing this in any way?

It kind of sucks when this is happening on an otherwise perfectly usable
old(ish) machine ...

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-01-29 Thread Linus Torvalds

On Thu, Jan 29, 2015 at 5:12 PM, Linus Torvalds
torva...@linux-foundation.org wrote:

 The WARN() was already changed to a WARN_ONCE().

Oh, but I notice that the __set_current_state(TASK_RUNNING) ends up
always happening.

So I think the right fix is to:

 - warn once like we do

 - but *not* do that __set_current_state() which was always total crap anyway

Why do I say total crap? Because of two independent issues:

 (a) it actually changes behavior for a debug vs non-debug kernel,
which is a really bad idea to begin with

 (b) it's really wrong. The whole nested sleep case was never a
major bug to begin with, just a possible inefficiency where constant
nested sleeps would possibly make the outer sleep not sleep. But that
could possibly make case was the unlikely case, and the debug patch
made it happen *all* the time by explicitly setting things running.

So I think the proper patch is the attached.

The comment is also crap. The comment says

Blocking primitives will set (and therefore destroy) current-state [...]

but the reality is that they *may* set it, and only in the unlikely
slow-path where they actually block.

So doing this in __may_sleep() is just bogus and horrible horrible
crap. It turns the harmless ugliness into a real *harmful* bug. The
key word of __may_sleep() is that MAY part. It's a debug thing to
make relatively rare cases show up.

PeterZ, please don't make debugging patches like this. Ever again.
Because this was just stupid, and it took me too long to realize that
despite the warning being shut up, the debug patch was still actively
doing bad bad things.

Ingo, maybe you'd want to apply this through the scheduler tree, the
way you already did the WARN_ONCE() thing.

Bruno, does this finally actually fix your pccard thing?

  Linus
 kernel/sched/core.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c0accc00566e..76aaf5c61114 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7292,13 +7292,12 @@ void __might_sleep(const char *file, int line, int 
preempt_offset)
 * since we will exit with TASK_RUNNING make sure we enter with it,
 * otherwise we will destroy state.
 */
-   if (WARN_ONCE(current-state != TASK_RUNNING,
+   WARN_ONCE(current-state != TASK_RUNNING,
do not call blocking ops when !TASK_RUNNING; 
state=%lx set at [%p] %pS\n,
current-state,
(void *)current-task_state_change,
-   (void *)current-task_state_change))
-   __set_current_state(TASK_RUNNING);
+   (void *)current-task_state_change);
 
___might_sleep(file, line, preempt_offset);
 }

Re: Linux 3.19-rc5

2015-01-29 Thread Linus Torvalds

On Thu, Jan 29, 2015 at 5:22 PM, Rafael J. Wysocki r...@rjwysocki.net wrote:
 On Wednesday, January 21, 2015 05:54:00 PM Peter Hurley wrote:

 Yeah, but the debug check is triggering worse behavior, requiring
 bisecting back to the debug commit.

 Yes, it is.

 So I'm wondering is anyone is working on fixing this in any way?

 It kind of sucks when this is happening on an otherwise perfectly usable
 old(ish) machine ...

The WARN() was already changed to a WARN_ONCE().

So that debug check doesn't cause problems any more. If somebody is
bisecting something else, and the WARN() is a problem for those
intermediate kernels, then just disabling CONFIG_DEBUG_ATOMIC_SLEEP
should get you past that point.

IOW, this really shouldn't be an issue.

Does the pccard thing still not work?

   Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-01-29 Thread Rafael J. Wysocki

On Thursday, January 29, 2015 05:12:11 PM Linus Torvalds wrote:
 On Thu, Jan 29, 2015 at 5:22 PM, Rafael J. Wysocki r...@rjwysocki.net wrote:
  On Wednesday, January 21, 2015 05:54:00 PM Peter Hurley wrote:
 
  Yeah, but the debug check is triggering worse behavior, requiring
  bisecting back to the debug commit.
 
  Yes, it is.
 
  So I'm wondering is anyone is working on fixing this in any way?
 
  It kind of sucks when this is happening on an otherwise perfectly usable
  old(ish) machine ...
 
 The WARN() was already changed to a WARN_ONCE().
 
 So that debug check doesn't cause problems any more. If somebody is
 bisecting something else, and the WARN() is a problem for those
 intermediate kernels, then just disabling CONFIG_DEBUG_ATOMIC_SLEEP
 should get you past that point.
 
 IOW, this really shouldn't be an issue.
 
 Does the pccard thing still not work?

Interestingly enough, if the kernel is built with CONFIG_DEBUG_ATOMIC_SLEEP
unset, the problem with 99+% CPU load from pccardd goes away, so thanks for
the hint.

Rafael

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-01-29 Thread Linus Torvalds

On Thu, Jan 29, 2015 at 5:25 PM, Linus Torvalds
torva...@linux-foundation.org wrote:

 Ingo, maybe you'd want to apply this through the scheduler tree, the
 way you already did the WARN_ONCE() thing.

Side note: I can obviously just commit it myself, but for things that
have obvious maintainers I tend to try to try to push the changes
through channels.

Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-01-29 Thread Rafael J. Wysocki

On Wednesday, January 21, 2015 05:54:00 PM Peter Hurley wrote:
 On 01/21/2015 05:12 PM, Davidlohr Bueso wrote:
  On Wed, 2015-01-21 at 22:37 +0100, Bruno Prémont wrote:
  On Wed, 21 January 2015 Bruno Prémont wrote:
  On Tue, 20 January 2015 Linus Torvalds wrote:
  On Tue, Jan 20, 2015 at 6:02 AM, Bruno Prémont wrote:
 
  No idea yet which rc is the offender (nor exact patch), but on my not
  so recent UP laptop with a pccard slot I have 2 pccardd kernel threads
  converting my laptop into a heater.
 
 [...]
 
  Bisecting to the end did point me at (the warning traces produced in great
  quantities might not be the very same issue as the abusive CPU usage, but
  certainly look very related):
[CCing people on CC for the patch]
 
  commit 8eb23b9f35aae413140d3fda766a98092c21e9b0
  Author: Peter Zijlstra pet...@infradead.org
  Date:   Wed Sep 24 10:18:55 2014 +0200
 
 [...]
 
  Which does produce the following trace (hand-copied most important parts 
  of it):
Warning: CPU 0 PID: 68 at kernel/sched/core.c:7311 
  __might_sleep+0x143/0x170
do not call blocking ops when !TASK_RUNNING; state=1 set at [c1436390] 
  pccardd+0xa0/0x3e0
...
Call trace:
  ...
  __might_sleep+0x143/0x170
  ? pccardd+0xa0/0x3e0
  ? pccardd+0xa0/0x3e0
  mutex_lock+0x17/0x2a
  pccardd+0xe9/0x3e0
  ? pcmcia_socket_uevent+0x30/0x30
 
  pccardd() is located in drivers/pcmcia/cs.c and seems to be of the 
  structure
  Peter's patch wants to warn about.
  
  Yeah setting current to interruptable so early in the game is bogus. It
  should be set after unlocking the skt_mutex.
 
 Yeah, but the debug check is triggering worse behavior, requiring
 bisecting back to the debug commit.

Yes, it is.

So I'm wondering is anyone is working on fixing this in any way?

It kind of sucks when this is happening on an otherwise perfectly usable
old(ish) machine ...

Rafael

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-01-29 Thread Rafael J. Wysocki

On Thursday, January 29, 2015 05:25:07 PM Linus Torvalds wrote:
 On Thu, Jan 29, 2015 at 5:12 PM, Linus Torvalds
 torva...@linux-foundation.org wrote:
 
  The WARN() was already changed to a WARN_ONCE().
 
 Oh, but I notice that the __set_current_state(TASK_RUNNING) ends up
 always happening.
 
 So I think the right fix is to:
 
  - warn once like we do
 
  - but *not* do that __set_current_state() which was always total crap anyway
 
 Why do I say total crap? Because of two independent issues:
 
  (a) it actually changes behavior for a debug vs non-debug kernel,
 which is a really bad idea to begin with
 
  (b) it's really wrong. The whole nested sleep case was never a
 major bug to begin with, just a possible inefficiency where constant
 nested sleeps would possibly make the outer sleep not sleep. But that
 could possibly make case was the unlikely case, and the debug patch
 made it happen *all* the time by explicitly setting things running.
 
 So I think the proper patch is the attached.

I applied the patch and compiled the kernel with CONFIG_DEBUG_ATOMIC_SLEEP
set again.  The warning is back, so the DEBUG_ATOMIC_SLEEP thing appears to
be working (and I don't really care about the freaking warning anyway), but the
pccardd load issue is gone, so the patch fixes the problem for me.  Thanks!

Rafael

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-01-21 Thread Peter Hurley

On 01/21/2015 05:12 PM, Davidlohr Bueso wrote:
> On Wed, 2015-01-21 at 22:37 +0100, Bruno Prémont wrote:
>> On Wed, 21 January 2015 Bruno Prémont wrote:
>>> On Tue, 20 January 2015 Linus Torvalds wrote:
 On Tue, Jan 20, 2015 at 6:02 AM, Bruno Prémont wrote:
>
> No idea yet which rc is the offender (nor exact patch), but on my not
> so recent UP laptop with a pccard slot I have 2 pccardd kernel threads
> converting my laptop into a heater.

[...]

>> Bisecting to the end did point me at (the warning traces produced in great
>> quantities might not be the very same issue as the abusive CPU usage, but
>> certainly look very related):
>>   [CCing people on CC for the patch]
>>
>> commit 8eb23b9f35aae413140d3fda766a98092c21e9b0
>> Author: Peter Zijlstra 
>> Date:   Wed Sep 24 10:18:55 2014 +0200

[...]

>> Which does produce the following trace (hand-copied most important parts of 
>> it):
>>   Warning: CPU 0 PID: 68 at kernel/sched/core.c:7311 
>> __might_sleep+0x143/0x170
>>   do not call blocking ops when !TASK_RUNNING; state=1 set at [] 
>> pccardd+0xa0/0x3e0
>>   ...
>>   Call trace:
>> ...
>> __might_sleep+0x143/0x170
>> ? pccardd+0xa0/0x3e0
>> ? pccardd+0xa0/0x3e0
>> mutex_lock+0x17/0x2a
>> pccardd+0xe9/0x3e0
>> ? pcmcia_socket_uevent+0x30/0x30
>>
>> pccardd() is located in drivers/pcmcia/cs.c and seems to be of the structure
>> Peter's patch wants to warn about.
> 
> Yeah setting current to interruptable so early in the game is bogus. It
> should be set after unlocking the skt_mutex.

Yeah, but the debug check is triggering worse behavior, requiring
bisecting back to the debug commit.

How does the might_sleep() check here guarantee the task won't sleep?

Regards,
Peter Hurley

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-01-21 Thread Davidlohr Bueso

On Wed, 2015-01-21 at 22:37 +0100, Bruno Prémont wrote:
> On Wed, 21 January 2015 Bruno Prémont wrote:
> > On Tue, 20 January 2015 Linus Torvalds wrote:
> > > On Tue, Jan 20, 2015 at 6:02 AM, Bruno Prémont wrote:
> > > >
> > > > No idea yet which rc is the offender (nor exact patch), but on my not
> > > > so recent UP laptop with a pccard slot I have 2 pccardd kernel threads
> > > > converting my laptop into a heater.
> > > >
> > > > lspci for affected nodes:
> > > > 02:06.0 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus 
> > > > Controller [1217:7113] (rev 20)
> > > > 02:06.1 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus 
> > > > Controller [1217:7113] (rev 20)
> > > >
> > > > Very basics I have, before I attempt any bisection:
> > > 
> > > Hmm. I'm not seeing anything recent changing anything in this area, so
> > > I suspect that unless somebody else steps up and says "Ahh, that
> > > sounds like xyz", your bisection is the best option.
> 
> Bisecting to the end did point me at (the warning traces produced in great
> quantities might not be the very same issue as the abusive CPU usage, but
> certainly look very related):
>   [CCing people on CC for the patch]
> 
> commit 8eb23b9f35aae413140d3fda766a98092c21e9b0
> Author: Peter Zijlstra 
> Date:   Wed Sep 24 10:18:55 2014 +0200
> 
> sched: Debug nested sleeps
> 
> Validate we call might_sleep() with TASK_RUNNING, which catches places
> where we nest blocking primitives, eg. mutex usage in a wait loop.
> 
> Since all blocking is arranged through task_struct::state, nesting
> this will cause the inner primitive to set TASK_RUNNING and the outer
> will thus not block.
> 
> Another observed problem is calling a blocking function from
> schedule()->sched_submit_work()->blk_schedule_flush_plug() which will
> then destroy the task state for the actual __schedule() call that
> comes after it.
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> Cc: t...@linutronix.de
> Cc: ilya.dryo...@inktank.com
> Cc: umgwanakikb...@gmail.com
> Cc: o...@redhat.com
> Cc: Linus Torvalds 
> Link: http://lkml.kernel.org/r/20140924082242.591637...@infradead.org
> Signed-off-by: Ingo Molnar 
> 
> Which does produce the following trace (hand-copied most important parts of 
> it):
>   Warning: CPU 0 PID: 68 at kernel/sched/core.c:7311 __might_sleep+0x143/0x170
>   do not call blocking ops when !TASK_RUNNING; state=1 set at [] 
> pccardd+0xa0/0x3e0
>   ...
>   Call trace:
> ...
> __might_sleep+0x143/0x170
> ? pccardd+0xa0/0x3e0
> ? pccardd+0xa0/0x3e0
> mutex_lock+0x17/0x2a
> pccardd+0xe9/0x3e0
> ? pcmcia_socket_uevent+0x30/0x30
> 
> pccardd() is located in drivers/pcmcia/cs.c and seems to be of the structure
> Peter's patch wants to warn about.

Yeah setting current to interruptable so early in the game is bogus. It
should be set after unlocking the skt_mutex.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-01-21 Thread Bruno Prémont

On Wed, 21 January 2015 Bruno Prémont wrote:
> On Tue, 20 January 2015 Linus Torvalds wrote:
> > On Tue, Jan 20, 2015 at 6:02 AM, Bruno Prémont wrote:
> > >
> > > No idea yet which rc is the offender (nor exact patch), but on my not
> > > so recent UP laptop with a pccard slot I have 2 pccardd kernel threads
> > > converting my laptop into a heater.
> > >
> > > lspci for affected nodes:
> > > 02:06.0 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus 
> > > Controller [1217:7113] (rev 20)
> > > 02:06.1 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus 
> > > Controller [1217:7113] (rev 20)
> > >
> > > Very basics I have, before I attempt any bisection:
> > 
> > Hmm. I'm not seeing anything recent changing anything in this area, so
> > I suspect that unless somebody else steps up and says "Ahh, that
> > sounds like xyz", your bisection is the best option.

Bisecting to the end did point me at (the warning traces produced in great
quantities might not be the very same issue as the abusive CPU usage, but
certainly look very related):
  [CCing people on CC for the patch]

commit 8eb23b9f35aae413140d3fda766a98092c21e9b0
Author: Peter Zijlstra 
Date:   Wed Sep 24 10:18:55 2014 +0200

sched: Debug nested sleeps

Validate we call might_sleep() with TASK_RUNNING, which catches places
where we nest blocking primitives, eg. mutex usage in a wait loop.

Since all blocking is arranged through task_struct::state, nesting
this will cause the inner primitive to set TASK_RUNNING and the outer
will thus not block.

Another observed problem is calling a blocking function from
schedule()->sched_submit_work()->blk_schedule_flush_plug() which will
then destroy the task state for the actual __schedule() call that
comes after it.

Signed-off-by: Peter Zijlstra (Intel) 
Cc: t...@linutronix.de
Cc: ilya.dryo...@inktank.com
Cc: umgwanakikb...@gmail.com
Cc: o...@redhat.com
Cc: Linus Torvalds 
Link: http://lkml.kernel.org/r/20140924082242.591637...@infradead.org
Signed-off-by: Ingo Molnar 

Which does produce the following trace (hand-copied most important parts of it):
  Warning: CPU 0 PID: 68 at kernel/sched/core.c:7311 __might_sleep+0x143/0x170
  do not call blocking ops when !TASK_RUNNING; state=1 set at [] 
pccardd+0xa0/0x3e0
  ...
  Call trace:
...
__might_sleep+0x143/0x170
? pccardd+0xa0/0x3e0
? pccardd+0xa0/0x3e0
mutex_lock+0x17/0x2a
pccardd+0xe9/0x3e0
? pcmcia_socket_uevent+0x30/0x30

pccardd() is located in drivers/pcmcia/cs.c and seems to be of the structure
Peter's patch wants to warn about.

Bruno
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-01-21 Thread Bruno Prémont

On Tue, 20 January 2015 Linus Torvalds wrote:
> On Tue, Jan 20, 2015 at 6:02 AM, Bruno Prémont wrote:
> >
> > No idea yet which rc is the offender (nor exact patch), but on my not
> > so recent UP laptop with a pccard slot I have 2 pccardd kernel threads
> > converting my laptop into a heater.
> >
> > lspci for affected nodes:
> > 02:06.0 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus 
> > Controller [1217:7113] (rev 20)
> > 02:06.1 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus 
> > Controller [1217:7113] (rev 20)
> >
> > Very basics I have, before I attempt any bisection:
> 
> Hmm. I'm not seeing anything recent changing anything in this area, so
> I suspect that unless somebody else steps up and says "Ahh, that
> sounds like xyz", your bisection is the best option.

I've done most of the bisection and ended up in the scheduler changes.
CCing Peter.

A few iterations from the end kernel is warning about (might?)sleeping during
locking or so (scrolling too fast to read end making access to console
hard).

My current bisection log is:
git bisect start
# bad: [117af36e3bceb4fceb1c63489d6f3d94ed259a4c] Apple-GMUX: inhibit VGA 
arbitration changes
git bisect bad 117af36e3bceb4fceb1c63489d6f3d94ed259a4c
# good: [b2776bf7149bddd1f4161f14f79520f17fc1d71d] Linux 3.18
git bisect good b2776bf7149bddd1f4161f14f79520f17fc1d71d
# bad: [c0222ac086669a631814bbf857f8c8023452a4d7] Merge branch 'upstream' of 
git://git.linux-mips.org/pub/scm/ralf/upstream-linus
git bisect bad c0222ac086669a631814bbf857f8c8023452a4d7
# bad: [2183a58803c2bbd87c2d0057eed6779ec4718d4d] Merge tag 'media/v3.19-rc1' 
of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
git bisect bad 2183a58803c2bbd87c2d0057eed6779ec4718d4d
# good: [6da314122ddc11936c6f054753bbb956a499d020] Merge tag 'dt-for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
git bisect good 6da314122ddc11936c6f054753bbb956a499d020
# bad: [cbfe0de303a55ed96d8831c2d5f56f8131cd6612] Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
git bisect bad cbfe0de303a55ed96d8831c2d5f56f8131cd6612
# bad: [9e66645d72d3c395da92b0f8855c787f4b5f0e89] Merge branch 
'irq-irqdomain-for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 9e66645d72d3c395da92b0f8855c787f4b5f0e89
# good: [5706ffd045c3810912c4982357d7daa721af3464] Merge branch 
'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 5706ffd045c3810912c4982357d7daa721af3464
# bad: [a157508c9790ccd1c8b5c6a828d6ba85bbe95aaa] Merge branch 
'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad a157508c9790ccd1c8b5c6a828d6ba85bbe95aaa
# bad: [acb32132ec0433c03bed750f3e9508dc29db0328] sched/deadline: Add deadline 
rq status print
git bisect bad acb32132ec0433c03bed750f3e9508dc29db0328
# good: [1029a2b52c09e479fd7b07275812ad97868c0fb0] sched, exit: Deal with 
nested sleeps
git bisect good 1029a2b52c09e479fd7b07275812ad97868c0fb0


That means, the remaining commits would be:
 commit acb32132ec0433c03bed750f3e9508dc29db0328
 Author: Wanpeng Li 
 Date:   Fri Oct 31 06:39:33 2014 +0800

 sched/deadline: Add deadline rq status print

 ...

 commit e23738a7300a7591a57a22f47b813fd1b53ec404
 Author: Peter Zijlstra 
 Date:   Wed Sep 24 10:18:50 2014 +0200

 sched, inotify: Deal with nested sleeps


It looks like pccardd might be doing the wrong thing for
locking/sleeping/waiting.


Trying to complete bisection.

Bruno
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-01-21 Thread Davidlohr Bueso

On Wed, 2015-01-21 at 22:37 +0100, Bruno Prémont wrote:
 On Wed, 21 January 2015 Bruno Prémont wrote:
  On Tue, 20 January 2015 Linus Torvalds wrote:
   On Tue, Jan 20, 2015 at 6:02 AM, Bruno Prémont wrote:
   
No idea yet which rc is the offender (nor exact patch), but on my not
so recent UP laptop with a pccard slot I have 2 pccardd kernel threads
converting my laptop into a heater.
   
lspci for affected nodes:
02:06.0 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus 
Controller [1217:7113] (rev 20)
02:06.1 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus 
Controller [1217:7113] (rev 20)
   
Very basics I have, before I attempt any bisection:
   
   Hmm. I'm not seeing anything recent changing anything in this area, so
   I suspect that unless somebody else steps up and says Ahh, that
   sounds like xyz, your bisection is the best option.
 
 Bisecting to the end did point me at (the warning traces produced in great
 quantities might not be the very same issue as the abusive CPU usage, but
 certainly look very related):
   [CCing people on CC for the patch]
 
 commit 8eb23b9f35aae413140d3fda766a98092c21e9b0
 Author: Peter Zijlstra pet...@infradead.org
 Date:   Wed Sep 24 10:18:55 2014 +0200
 
 sched: Debug nested sleeps
 
 Validate we call might_sleep() with TASK_RUNNING, which catches places
 where we nest blocking primitives, eg. mutex usage in a wait loop.
 
 Since all blocking is arranged through task_struct::state, nesting
 this will cause the inner primitive to set TASK_RUNNING and the outer
 will thus not block.
 
 Another observed problem is calling a blocking function from
 schedule()-sched_submit_work()-blk_schedule_flush_plug() which will
 then destroy the task state for the actual __schedule() call that
 comes after it.
 
 Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org
 Cc: t...@linutronix.de
 Cc: ilya.dryo...@inktank.com
 Cc: umgwanakikb...@gmail.com
 Cc: o...@redhat.com
 Cc: Linus Torvalds torva...@linux-foundation.org
 Link: http://lkml.kernel.org/r/20140924082242.591637...@infradead.org
 Signed-off-by: Ingo Molnar mi...@kernel.org
 
 Which does produce the following trace (hand-copied most important parts of 
 it):
   Warning: CPU 0 PID: 68 at kernel/sched/core.c:7311 __might_sleep+0x143/0x170
   do not call blocking ops when !TASK_RUNNING; state=1 set at [c1436390] 
 pccardd+0xa0/0x3e0
   ...
   Call trace:
 ...
 __might_sleep+0x143/0x170
 ? pccardd+0xa0/0x3e0
 ? pccardd+0xa0/0x3e0
 mutex_lock+0x17/0x2a
 pccardd+0xe9/0x3e0
 ? pcmcia_socket_uevent+0x30/0x30
 
 pccardd() is located in drivers/pcmcia/cs.c and seems to be of the structure
 Peter's patch wants to warn about.

Yeah setting current to interruptable so early in the game is bogus. It
should be set after unlocking the skt_mutex.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-01-21 Thread Bruno Prémont

On Wed, 21 January 2015 Bruno Prémont wrote:
 On Tue, 20 January 2015 Linus Torvalds wrote:
  On Tue, Jan 20, 2015 at 6:02 AM, Bruno Prémont wrote:
  
   No idea yet which rc is the offender (nor exact patch), but on my not
   so recent UP laptop with a pccard slot I have 2 pccardd kernel threads
   converting my laptop into a heater.
  
   lspci for affected nodes:
   02:06.0 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus 
   Controller [1217:7113] (rev 20)
   02:06.1 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus 
   Controller [1217:7113] (rev 20)
  
   Very basics I have, before I attempt any bisection:
  
  Hmm. I'm not seeing anything recent changing anything in this area, so
  I suspect that unless somebody else steps up and says Ahh, that
  sounds like xyz, your bisection is the best option.

Bisecting to the end did point me at (the warning traces produced in great
quantities might not be the very same issue as the abusive CPU usage, but
certainly look very related):
  [CCing people on CC for the patch]

commit 8eb23b9f35aae413140d3fda766a98092c21e9b0
Author: Peter Zijlstra pet...@infradead.org
Date:   Wed Sep 24 10:18:55 2014 +0200

sched: Debug nested sleeps

Validate we call might_sleep() with TASK_RUNNING, which catches places
where we nest blocking primitives, eg. mutex usage in a wait loop.

Since all blocking is arranged through task_struct::state, nesting
this will cause the inner primitive to set TASK_RUNNING and the outer
will thus not block.

Another observed problem is calling a blocking function from
schedule()-sched_submit_work()-blk_schedule_flush_plug() which will
then destroy the task state for the actual __schedule() call that
comes after it.

Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org
Cc: t...@linutronix.de
Cc: ilya.dryo...@inktank.com
Cc: umgwanakikb...@gmail.com
Cc: o...@redhat.com
Cc: Linus Torvalds torva...@linux-foundation.org
Link: http://lkml.kernel.org/r/20140924082242.591637...@infradead.org
Signed-off-by: Ingo Molnar mi...@kernel.org

Which does produce the following trace (hand-copied most important parts of it):
  Warning: CPU 0 PID: 68 at kernel/sched/core.c:7311 __might_sleep+0x143/0x170
  do not call blocking ops when !TASK_RUNNING; state=1 set at [c1436390] 
pccardd+0xa0/0x3e0
  ...
  Call trace:
...
__might_sleep+0x143/0x170
? pccardd+0xa0/0x3e0
? pccardd+0xa0/0x3e0
mutex_lock+0x17/0x2a
pccardd+0xe9/0x3e0
? pcmcia_socket_uevent+0x30/0x30

pccardd() is located in drivers/pcmcia/cs.c and seems to be of the structure
Peter's patch wants to warn about.


Bruno
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-01-21 Thread Peter Hurley

On 01/21/2015 05:12 PM, Davidlohr Bueso wrote:
 On Wed, 2015-01-21 at 22:37 +0100, Bruno Prémont wrote:
 On Wed, 21 January 2015 Bruno Prémont wrote:
 On Tue, 20 January 2015 Linus Torvalds wrote:
 On Tue, Jan 20, 2015 at 6:02 AM, Bruno Prémont wrote:

 No idea yet which rc is the offender (nor exact patch), but on my not
 so recent UP laptop with a pccard slot I have 2 pccardd kernel threads
 converting my laptop into a heater.

[...]

 Bisecting to the end did point me at (the warning traces produced in great
 quantities might not be the very same issue as the abusive CPU usage, but
 certainly look very related):
   [CCing people on CC for the patch]

 commit 8eb23b9f35aae413140d3fda766a98092c21e9b0
 Author: Peter Zijlstra pet...@infradead.org
 Date:   Wed Sep 24 10:18:55 2014 +0200

[...]

 Which does produce the following trace (hand-copied most important parts of 
 it):
   Warning: CPU 0 PID: 68 at kernel/sched/core.c:7311 
 __might_sleep+0x143/0x170
   do not call blocking ops when !TASK_RUNNING; state=1 set at [c1436390] 
 pccardd+0xa0/0x3e0
   ...
   Call trace:
 ...
 __might_sleep+0x143/0x170
 ? pccardd+0xa0/0x3e0
 ? pccardd+0xa0/0x3e0
 mutex_lock+0x17/0x2a
 pccardd+0xe9/0x3e0
 ? pcmcia_socket_uevent+0x30/0x30

 pccardd() is located in drivers/pcmcia/cs.c and seems to be of the structure
 Peter's patch wants to warn about.
 
 Yeah setting current to interruptable so early in the game is bogus. It
 should be set after unlocking the skt_mutex.

Yeah, but the debug check is triggering worse behavior, requiring
bisecting back to the debug commit.

How does the might_sleep() check here guarantee the task won't sleep?

Regards,
Peter Hurley

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-01-21 Thread Bruno Prémont

On Tue, 20 January 2015 Linus Torvalds wrote:
 On Tue, Jan 20, 2015 at 6:02 AM, Bruno Prémont wrote:
 
  No idea yet which rc is the offender (nor exact patch), but on my not
  so recent UP laptop with a pccard slot I have 2 pccardd kernel threads
  converting my laptop into a heater.
 
  lspci for affected nodes:
  02:06.0 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus 
  Controller [1217:7113] (rev 20)
  02:06.1 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus 
  Controller [1217:7113] (rev 20)
 
  Very basics I have, before I attempt any bisection:
 
 Hmm. I'm not seeing anything recent changing anything in this area, so
 I suspect that unless somebody else steps up and says Ahh, that
 sounds like xyz, your bisection is the best option.

I've done most of the bisection and ended up in the scheduler changes.
CCing Peter.

A few iterations from the end kernel is warning about (might?)sleeping during
locking or so (scrolling too fast to read end making access to console
hard).

My current bisection log is:
git bisect start
# bad: [117af36e3bceb4fceb1c63489d6f3d94ed259a4c] Apple-GMUX: inhibit VGA 
arbitration changes
git bisect bad 117af36e3bceb4fceb1c63489d6f3d94ed259a4c
# good: [b2776bf7149bddd1f4161f14f79520f17fc1d71d] Linux 3.18
git bisect good b2776bf7149bddd1f4161f14f79520f17fc1d71d
# bad: [c0222ac086669a631814bbf857f8c8023452a4d7] Merge branch 'upstream' of 
git://git.linux-mips.org/pub/scm/ralf/upstream-linus
git bisect bad c0222ac086669a631814bbf857f8c8023452a4d7
# bad: [2183a58803c2bbd87c2d0057eed6779ec4718d4d] Merge tag 'media/v3.19-rc1' 
of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
git bisect bad 2183a58803c2bbd87c2d0057eed6779ec4718d4d
# good: [6da314122ddc11936c6f054753bbb956a499d020] Merge tag 'dt-for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
git bisect good 6da314122ddc11936c6f054753bbb956a499d020
# bad: [cbfe0de303a55ed96d8831c2d5f56f8131cd6612] Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
git bisect bad cbfe0de303a55ed96d8831c2d5f56f8131cd6612
# bad: [9e66645d72d3c395da92b0f8855c787f4b5f0e89] Merge branch 
'irq-irqdomain-for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 9e66645d72d3c395da92b0f8855c787f4b5f0e89
# good: [5706ffd045c3810912c4982357d7daa721af3464] Merge branch 
'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 5706ffd045c3810912c4982357d7daa721af3464
# bad: [a157508c9790ccd1c8b5c6a828d6ba85bbe95aaa] Merge branch 
'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad a157508c9790ccd1c8b5c6a828d6ba85bbe95aaa
# bad: [acb32132ec0433c03bed750f3e9508dc29db0328] sched/deadline: Add deadline 
rq status print
git bisect bad acb32132ec0433c03bed750f3e9508dc29db0328
# good: [1029a2b52c09e479fd7b07275812ad97868c0fb0] sched, exit: Deal with 
nested sleeps
git bisect good 1029a2b52c09e479fd7b07275812ad97868c0fb0


That means, the remaining commits would be:
 commit acb32132ec0433c03bed750f3e9508dc29db0328
 Author: Wanpeng Li wanpeng...@linux.intel.com
 Date:   Fri Oct 31 06:39:33 2014 +0800

 sched/deadline: Add deadline rq status print

 ...

 commit e23738a7300a7591a57a22f47b813fd1b53ec404
 Author: Peter Zijlstra pet...@infradead.org
 Date:   Wed Sep 24 10:18:50 2014 +0200

 sched, inotify: Deal with nested sleeps


It looks like pccardd might be doing the wrong thing for
locking/sleeping/waiting.


Trying to complete bisection.

Bruno
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-01-19 Thread Linus Torvalds

On Tue, Jan 20, 2015 at 6:02 AM, Bruno Prémont
 wrote:
>
> No idea yet which rc is the offender (nor exact patch), but on my not
> so recent UP laptop with a pccard slot I have 2 pccardd kernel threads
> converting my laptop into a heater.
>
> lspci for affected nodes:
> 02:06.0 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus 
> Controller [1217:7113] (rev 20)
> 02:06.1 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus 
> Controller [1217:7113] (rev 20)
>
> Very basics I have, before I attempt any bisection:

Hmm. I'm not seeing anything recent changing anything in this area, so
I suspect that unless somebody else steps up and says "Ahh, that
sounds like xyz", your bisection is the best option.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-01-19 Thread Bruno Prémont

On Sun, 18 January 2015 Linus Torvalds  wrote:
> Another week, another -rc.
> 
> Fairly normal release, although I'd wish that by rc5 we'd have calmed
> down even further.  But no, with some of the driver tree merges in
> particular, this is actually larger than rc4 was.
> 
> That said, it's not like there is anything particularly scary in here.
> 
> The arm64 vm bug that I mentioned as pending in the rc4 notes got
> fixed within a day of that previous rc release, and the rest looks
> pretty standard. Mostly drivers (networking, usb, scsi target, block
> layer, mmc,  tty etc), but also arch updates (arm, x86, s390 and some
> tiny powerpc fixes), some filesystem updates (fuse and nfs), tracing
> fixes, and some perf tooling fixes.
> 
> Shortlog with the details appended.
> 
> Go forth and test.

No idea yet which rc is the offender (nor exact patch), but on my not
so recent UP laptop with a pccard slot I have 2 pccardd kernel threads
converting my laptop into a heater.

lspci for affected nodes:
02:06.0 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus Controller 
[1217:7113] (rev 20)
02:06.1 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus Controller 
[1217:7113] (rev 20)


Very basics I have, before I attempt any bisection:

cat /proc/$(pidof pccardd)/stack
[] pccardd+0x1a5/0x3e0
[] kthread+0xa0/0xc0
[] ret_from_kernel_thread+0x20/0x30
[] 0x


Doing objdump on drivers/pcmcia/cs.o in which pccardd() is defined
it seems pccardd is stuck in try_to_freeze_unsafe().

Does this kind of behavior ring a bell?

I can unbind the two yenta_cardbus devices from yenta_cardbus to stop
the CPU usage.


Possibly important config option:
# CONFIG_SMP is not set




Objdump below:

static int pccardd(void *__skt)
{
 ca0:   55  push   %ebp
 ca1:   89 e5   mov%esp,%ebp
 ca3:   57  push   %edi
 ca4:   56  push   %esi
 ca5:   53  push   %ebx
 ca6:   89 c3   mov%eax,%ebx
 ca8:   83 ec 24sub$0x24,%esp

DECLARE_PER_CPU(struct task_struct *, current_task);

static __always_inline struct task_struct *get_current(void)
{
return this_cpu_read_stable(current_task);
 cab:   a1 00 00 00 00  mov0x0,%eax
struct pcmcia_socket *skt = __skt;
int ret;

skt->thread = current;
 cb0:   89 83 e4 00 00 00   mov%eax,0xe4(%ebx)
skt->socket = dead_socket;
 cb6:   a1 00 00 00 00  mov0x0,%eax
skt->ops->init(skt);
 cbb:   8b 93 cc 00 00 00   mov0xcc(%ebx),%edx
{
struct pcmcia_socket *skt = __skt;
int ret;

skt->thread = current;
skt->socket = dead_socket;
 cc1:   89 43 04mov%eax,0x4(%ebx)
 cc4:   a1 04 00 00 00  mov0x4,%eax
 cc9:   89 43 08mov%eax,0x8(%ebx)
 ccc:   a1 08 00 00 00  mov0x8,%eax
 cd1:   89 43 0cmov%eax,0xc(%ebx)
skt->ops->init(skt);
 cd4:   89 d8   mov%ebx,%eax
 cd6:   ff 12   call   *(%edx)
skt->ops->set_socket(skt, >socket);
 cd8:   8b 8b cc 00 00 00   mov0xcc(%ebx),%ecx
 cde:   8d 53 04lea0x4(%ebx),%edx
 ce1:   89 d8   mov%ebx,%eax
 ce3:   ff 51 0ccall   *0xc(%ecx)

/* register with the device core */
ret = device_register(>dev);
 ce6:   8d 83 2c 01 00 00   lea0x12c(%ebx),%eax
 cec:   89 45 e0mov%eax,-0x20(%ebp)
 cef:   e8 fc ff ff ff  call   cf0 
if (ret) {
 cf4:   85 c0   test   %eax,%eax
 cf6:   0f 85 d4 02 00 00   jnefd0 
   "PCMCIA: unable to register socket\n");
skt->thread = NULL;
complete(>thread_done);
return 0;
}
ret = pccard_sysfs_add_socket(>dev);
 cfc:   8b 45 e0mov-0x20(%ebp),%eax
 cff:   e8 fc ff ff ff  call   d00 
if (ret)
 d04:   85 c0   test   %eax,%eax
   "PCMCIA: unable to register socket\n");
skt->thread = NULL;
complete(>thread_done);
return 0;
}
ret = pccard_sysfs_add_socket(>dev);
 d06:   89 45 e4mov%eax,-0x1c(%ebp)
if (ret)
 d09:   0f 85 01 03 00 00   jne1010 
dev_warn(>dev, "err %d adding socket attributes\n", ret);

complete(>thread_done);
 d0f:   8d 83 e8 00 00 00   lea0xe8(%ebx),%eax
 d15:   e8 fc ff ff ff  call   d16 

/* wait for userspace to catch up */

Re: Linux 3.19-rc5

2015-01-19 Thread Linus Torvalds

On Tue, Jan 20, 2015 at 6:02 AM, Bruno Prémont
bonb...@linux-vserver.org wrote:

 No idea yet which rc is the offender (nor exact patch), but on my not
 so recent UP laptop with a pccard slot I have 2 pccardd kernel threads
 converting my laptop into a heater.

 lspci for affected nodes:
 02:06.0 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus 
 Controller [1217:7113] (rev 20)
 02:06.1 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus 
 Controller [1217:7113] (rev 20)

 Very basics I have, before I attempt any bisection:

Hmm. I'm not seeing anything recent changing anything in this area, so
I suspect that unless somebody else steps up and says Ahh, that
sounds like xyz, your bisection is the best option.

   Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.19-rc5

2015-01-19 Thread Bruno Prémont

On Sun, 18 January 2015 Linus Torvalds torva...@linux-foundation.org wrote:
 Another week, another -rc.
 
 Fairly normal release, although I'd wish that by rc5 we'd have calmed
 down even further.  But no, with some of the driver tree merges in
 particular, this is actually larger than rc4 was.
 
 That said, it's not like there is anything particularly scary in here.
 
 The arm64 vm bug that I mentioned as pending in the rc4 notes got
 fixed within a day of that previous rc release, and the rest looks
 pretty standard. Mostly drivers (networking, usb, scsi target, block
 layer, mmc,  tty etc), but also arch updates (arm, x86, s390 and some
 tiny powerpc fixes), some filesystem updates (fuse and nfs), tracing
 fixes, and some perf tooling fixes.
 
 Shortlog with the details appended.
 
 Go forth and test.

No idea yet which rc is the offender (nor exact patch), but on my not
so recent UP laptop with a pccard slot I have 2 pccardd kernel threads
converting my laptop into a heater.

lspci for affected nodes:
02:06.0 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus Controller 
[1217:7113] (rev 20)
02:06.1 CardBus bridge [0607]: O2 Micro, Inc. OZ711EC1 SmartCardBus Controller 
[1217:7113] (rev 20)


Very basics I have, before I attempt any bisection:

cat /proc/$(pidof pccardd)/stack
[c143ec95] pccardd+0x1a5/0x3e0
[c10543a0] kthread+0xa0/0xc0
[c16cd840] ret_from_kernel_thread+0x20/0x30
[] 0x


Doing objdump on drivers/pcmcia/cs.o in which pccardd() is defined
it seems pccardd is stuck in try_to_freeze_unsafe().

Does this kind of behavior ring a bell?

I can unbind the two yenta_cardbus devices from yenta_cardbus to stop
the CPU usage.


Possibly important config option:
# CONFIG_SMP is not set




Objdump below:

static int pccardd(void *__skt)
{
 ca0:   55  push   %ebp
 ca1:   89 e5   mov%esp,%ebp
 ca3:   57  push   %edi
 ca4:   56  push   %esi
 ca5:   53  push   %ebx
 ca6:   89 c3   mov%eax,%ebx
 ca8:   83 ec 24sub$0x24,%esp

DECLARE_PER_CPU(struct task_struct *, current_task);

static __always_inline struct task_struct *get_current(void)
{
return this_cpu_read_stable(current_task);
 cab:   a1 00 00 00 00  mov0x0,%eax
struct pcmcia_socket *skt = __skt;
int ret;

skt-thread = current;
 cb0:   89 83 e4 00 00 00   mov%eax,0xe4(%ebx)
skt-socket = dead_socket;
 cb6:   a1 00 00 00 00  mov0x0,%eax
skt-ops-init(skt);
 cbb:   8b 93 cc 00 00 00   mov0xcc(%ebx),%edx
{
struct pcmcia_socket *skt = __skt;
int ret;

skt-thread = current;
skt-socket = dead_socket;
 cc1:   89 43 04mov%eax,0x4(%ebx)
 cc4:   a1 04 00 00 00  mov0x4,%eax
 cc9:   89 43 08mov%eax,0x8(%ebx)
 ccc:   a1 08 00 00 00  mov0x8,%eax
 cd1:   89 43 0cmov%eax,0xc(%ebx)
skt-ops-init(skt);
 cd4:   89 d8   mov%ebx,%eax
 cd6:   ff 12   call   *(%edx)
skt-ops-set_socket(skt, skt-socket);
 cd8:   8b 8b cc 00 00 00   mov0xcc(%ebx),%ecx
 cde:   8d 53 04lea0x4(%ebx),%edx
 ce1:   89 d8   mov%ebx,%eax
 ce3:   ff 51 0ccall   *0xc(%ecx)

/* register with the device core */
ret = device_register(skt-dev);
 ce6:   8d 83 2c 01 00 00   lea0x12c(%ebx),%eax
 cec:   89 45 e0mov%eax,-0x20(%ebp)
 cef:   e8 fc ff ff ff  call   cf0 pccardd+0x50
if (ret) {
 cf4:   85 c0   test   %eax,%eax
 cf6:   0f 85 d4 02 00 00   jnefd0 pccardd+0x330
   PCMCIA: unable to register socket\n);
skt-thread = NULL;
complete(skt-thread_done);
return 0;
}
ret = pccard_sysfs_add_socket(skt-dev);
 cfc:   8b 45 e0mov-0x20(%ebp),%eax
 cff:   e8 fc ff ff ff  call   d00 pccardd+0x60
if (ret)
 d04:   85 c0   test   %eax,%eax
   PCMCIA: unable to register socket\n);
skt-thread = NULL;
complete(skt-thread_done);
return 0;
}
ret = pccard_sysfs_add_socket(skt-dev);
 d06:   89 45 e4mov%eax,-0x1c(%ebp)
if (ret)
 d09:   0f 85 01 03 00 00   jne1010 pccardd+0x370
dev_warn(skt-dev, err %d adding socket attributes\n, ret);

complete(skt-thread_done);
 d0f:   8d 83 e8 00 00 00   lea0xe8(%ebx),%eax
 d15:

60 matches

Mail list logo