Re: [RFC][PATCH 0/7] nested sleeps, fixes and debug infra

2014-08-07 Thread Peter Zijlstra
On Wed, Aug 06, 2014 at 02:16:03PM -0700, David Miller wrote:
> From: Peter Zijlstra 
> Date: Wed, 6 Aug 2014 10:31:34 +0200
> 
> > On Wed, Aug 06, 2014 at 11:51:29AM +0400, Ilya Dryomov wrote:
> > 
> >> OK, this one is a bit different.
> >> 
> >> WARNING: CPU: 1 PID: 1744 at kernel/sched/core.c:7104 
> >> __might_sleep+0x58/0x90()
> >> do not call blocking ops when !TASK_RUNNING; state=1 set at 
> >> [] prepare_to_wait+0x50 /0xa0
> > 
> >>  [] __might_sleep+0x58/0x90
> >>  [] lock_sock_nested+0x31/0xb0
> >>  [] sk_stream_wait_memory+0x18a/0x2d0
> > 
> > Urgh, tedious. Its not an actual bug as is. Due to the condition check
> > in sk_wait_event() we can call lock_sock() with ->state != TASK_RUNNING.
> > 
> > I'm not entirely sure what the cleanest way is to make this go away.
> > Possibly something like so:
> 
> If you submit this formally to netdev with a signoff I'm willing to apply
> this if it helps the debug infrastructure.

Thanks, for now I'm just collecting things to see how far I can take
this. But I'll certainly include you and netdev on a next posting.


pgpkr5GUvDpWQ.pgp
Description: PGP signature


Re: [RFC][PATCH 0/7] nested sleeps, fixes and debug infra

2014-08-06 Thread David Miller
From: Peter Zijlstra 
Date: Wed, 6 Aug 2014 10:31:34 +0200

> On Wed, Aug 06, 2014 at 11:51:29AM +0400, Ilya Dryomov wrote:
> 
>> OK, this one is a bit different.
>> 
>> WARNING: CPU: 1 PID: 1744 at kernel/sched/core.c:7104 
>> __might_sleep+0x58/0x90()
>> do not call blocking ops when !TASK_RUNNING; state=1 set at 
>> [] prepare_to_wait+0x50 /0xa0
> 
>>  [] __might_sleep+0x58/0x90
>>  [] lock_sock_nested+0x31/0xb0
>>  [] sk_stream_wait_memory+0x18a/0x2d0
> 
> Urgh, tedious. Its not an actual bug as is. Due to the condition check
> in sk_wait_event() we can call lock_sock() with ->state != TASK_RUNNING.
> 
> I'm not entirely sure what the cleanest way is to make this go away.
> Possibly something like so:

If you submit this formally to netdev with a signoff I'm willing to apply
this if it helps the debug infrastructure.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 0/7] nested sleeps, fixes and debug infra

2014-08-06 Thread Peter Zijlstra
On Wed, Aug 06, 2014 at 11:51:29AM +0400, Ilya Dryomov wrote:

> OK, this one is a bit different.
> 
> WARNING: CPU: 1 PID: 1744 at kernel/sched/core.c:7104 
> __might_sleep+0x58/0x90()
> do not call blocking ops when !TASK_RUNNING; state=1 set at 
> [] prepare_to_wait+0x50 /0xa0

>  [] __might_sleep+0x58/0x90
>  [] lock_sock_nested+0x31/0xb0
>  [] sk_stream_wait_memory+0x18a/0x2d0

Urgh, tedious. Its not an actual bug as is. Due to the condition check
in sk_wait_event() we can call lock_sock() with ->state != TASK_RUNNING.

I'm not entirely sure what the cleanest way is to make this go away.
Possibly something like so:

---
 include/net/sock.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/net/sock.h b/include/net/sock.h
index 156350745700..37902176c5ab 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -886,6 +886,7 @@ static inline void sock_rps_reset_rxhash(struct sock *sk)
if (!__rc) {\
*(__timeo) = schedule_timeout(*(__timeo));  \
}   \
+   __set_current_state(TASK_RUNNING);  \
lock_sock(__sk);\
__rc = __condition; \
__rc;   \


pgpbNe_DUVw5_.pgp
Description: PGP signature


Re: [RFC][PATCH 0/7] nested sleeps, fixes and debug infra

2014-08-06 Thread Ilya Dryomov
On Tue, Aug 5, 2014 at 5:06 PM, Peter Zijlstra  wrote:
> On Tue, Aug 05, 2014 at 12:33:16PM +0400, Ilya Dryomov wrote:
>> On Mon, Aug 4, 2014 at 2:30 PM, Peter Zijlstra  wrote:
>> > Hi,
>> >
>> > Ilya recently tripped over a nested sleep which made Ingo suggest we should
>> > have debug checks for that. So I did some, see patch 7. Of course that
>> > triggered a whole bunch of fail the instant I tried to boot my machine.
>> >
>> > With this series I can boot my test box and build a kernel on it, I'm 
>> > fairly
>> > sure that's far too limited a test to have found all, but its a start.
>>
>> FWIW, I'm getting a lot of these during light rbd testing.  CC'ed
>> netdev and linux-mm.
>
> Both are cond_resched() calls, and that's not blocking as such, just a
> preemption point, so lets exclude those.

OK, this one is a bit different.

WARNING: CPU: 1 PID: 1744 at kernel/sched/core.c:7104 __might_sleep+0x58/0x90()
do not call blocking ops when !TASK_RUNNING; state=1 set at
[] prepare_to_wait+0x50 /0xa0
Modules linked in:
CPU: 1 PID: 1744 Comm: lt-ceph_test_li Not tainted 3.16.0-vm+ #113
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
 1bc0 88006c4479d8 8156f455 
 88006c447a28 88006c447a18 81033357 0001
  0950 817ee48a 88006dba6120
Call Trace:
 [] dump_stack+0x4f/0x7c
 [] warn_slowpath_common+0x87/0xb0
 [] warn_slowpath_fmt+0x41/0x50
 [] ? trace_hardirqs_on_caller+0x182/0x1f0
 [] ? prepare_to_wait+0x50/0xa0
 [] ? prepare_to_wait+0x50/0xa0
 [] __might_sleep+0x58/0x90
 [] lock_sock_nested+0x31/0xb0
 [] ? release_sock+0x1bb/0x200
 [] sk_stream_wait_memory+0x18a/0x2d0
 [] ? woken_wake_function+0x10/0x10
 [] tcp_sendmsg+0xb6f/0xd70
 [] inet_sendmsg+0xdf/0x100
 [] ? inet_recvmsg+0x100/0x100
 [] sock_sendmsg+0x67/0x90
 [] ? might_fault+0x51/0xb0
 [] ___sys_sendmsg+0x2d2/0x2e0
 [] ? do_dup2+0xd0/0xd0
 [] ? do_dup2+0xd0/0xd0
 [] ? finish_task_switch+0x50/0x100
 [] ? __fget_light+0x45/0x60
 [] ? __fdget+0xe/0x10
 [] __sys_sendmsg+0x44/0x70
 [] SyS_sendmsg+0x9/0x10
 [] system_call_fastpath+0x16/0x1b

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 0/7] nested sleeps, fixes and debug infra

2014-08-05 Thread Peter Zijlstra
On Tue, Aug 05, 2014 at 12:33:16PM +0400, Ilya Dryomov wrote:
> On Mon, Aug 4, 2014 at 2:30 PM, Peter Zijlstra  wrote:
> > Hi,
> >
> > Ilya recently tripped over a nested sleep which made Ingo suggest we should
> > have debug checks for that. So I did some, see patch 7. Of course that
> > triggered a whole bunch of fail the instant I tried to boot my machine.
> >
> > With this series I can boot my test box and build a kernel on it, I'm fairly
> > sure that's far too limited a test to have found all, but its a start.
> 
> FWIW, I'm getting a lot of these during light rbd testing.  CC'ed
> netdev and linux-mm.

Both are cond_resched() calls, and that's not blocking as such, just a
preemption point, so lets exclude those.

From the school of '_' are free:

---
 include/linux/kernel.h |3 +++
 include/linux/sched.h  |6 +++---
 kernel/sched/core.c|   12 +---
 3 files changed, 15 insertions(+), 6 deletions(-)

--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -162,6 +162,7 @@ extern int _cond_resched(void);
 #endif
 
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
+  void ___might_sleep(const char *file, int line, int preempt_offset);
   void __might_sleep(const char *file, int line, int preempt_offset);
 /**
  * might_sleep - annotation for functions that can sleep
@@ -176,6 +177,8 @@ extern int _cond_resched(void);
 # define might_sleep() \
do { __might_sleep(__FILE__, __LINE__, 0); might_resched(); } while (0)
 #else
+  static inline void ___might_sleep(const char *file, int line,
+  int preempt_offset) { }
   static inline void __might_sleep(const char *file, int line,
   int preempt_offset) { }
 # define might_sleep() do { might_resched(); } while (0)
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2754,7 +2754,7 @@ static inline int signal_pending_state(l
 extern int _cond_resched(void);
 
 #define cond_resched() ({  \
-   __might_sleep(__FILE__, __LINE__, 0);   \
+   ___might_sleep(__FILE__, __LINE__, 0);  \
_cond_resched();\
 })
 
@@ -2767,14 +2767,14 @@ extern int __cond_resched_lock(spinlock_
 #endif
 
 #define cond_resched_lock(lock) ({ \
-   __might_sleep(__FILE__, __LINE__, PREEMPT_LOCK_OFFSET); \
+   ___might_sleep(__FILE__, __LINE__, PREEMPT_LOCK_OFFSET);\
__cond_resched_lock(lock);  \
 })
 
 extern int __cond_resched_softirq(void);
 
 #define cond_resched_softirq() ({  \
-   __might_sleep(__FILE__, __LINE__, SOFTIRQ_DISABLE_OFFSET);  \
+   ___might_sleep(__FILE__, __LINE__, SOFTIRQ_DISABLE_OFFSET); \
__cond_resched_softirq();   \
 })
 
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7078,8 +7078,6 @@ static inline int preempt_count_equals(i
 
 void __might_sleep(const char *file, int line, int preempt_offset)
 {
-   static unsigned long prev_jiffy;/* ratelimiting */
-
/*
 * Blocking primitives will set (and therefore destroy) current->state,
 * since we will exit with TASK_RUNNING make sure we enter with it,
@@ -7093,6 +7091,14 @@ void __might_sleep(const char *file, int
(void *)current->task_state_change))
__set_current_state(TASK_RUNNING);
 
+   ___might_sleep(file, line, preempt_offset);
+}
+EXPORT_SYMBOL(__might_sleep);
+
+void ___might_sleep(const char *file, int line, int preempt_offset)
+{
+   static unsigned long prev_jiffy;/* ratelimiting */
+
rcu_sleep_check(); /* WARN_ON_ONCE() by default, no rate limit reqd. */
if ((preempt_count_equals(preempt_offset) && !irqs_disabled() &&
 !is_idle_task(current)) ||
@@ -7122,7 +7128,7 @@ void __might_sleep(const char *file, int
 #endif
dump_stack();
 }
-EXPORT_SYMBOL(__might_sleep);
+EXPORT_SYMBOL(___might_sleep);
 #endif
 
 #ifdef CONFIG_MAGIC_SYSRQ


pgpDOrSu0ZLAM.pgp
Description: PGP signature


Re: [RFC][PATCH 0/7] nested sleeps, fixes and debug infra

2014-08-05 Thread Ilya Dryomov
On Mon, Aug 4, 2014 at 2:30 PM, Peter Zijlstra  wrote:
> Hi,
>
> Ilya recently tripped over a nested sleep which made Ingo suggest we should
> have debug checks for that. So I did some, see patch 7. Of course that
> triggered a whole bunch of fail the instant I tried to boot my machine.
>
> With this series I can boot my test box and build a kernel on it, I'm fairly
> sure that's far too limited a test to have found all, but its a start.

FWIW, I'm getting a lot of these during light rbd testing.  CC'ed
netdev and linux-mm.

WARNING: CPU: 2 PID: 1978 at kernel/sched/core.c:7094 __might_sleep+0x5b/0x1e0()
do not call blocking ops when !TASK_RUNNING; state=1 set at
[] prepare_to_wait+0x50/0xa0
Modules linked in:
CPU: 2 PID: 1978 Comm: ceph-osd Not tainted 3.16.0-vm+ #109
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
 1bb6 8800126739e8 8156ec1d 
 880012673a38 880012673a28 81032c27 880012673a58
 0200 8800150fa060 07ad 817ed352
Call Trace:
 [] dump_stack+0x4f/0x7c
 [] warn_slowpath_common+0x87/0xb0
 [] warn_slowpath_fmt+0x41/0x50
 [] ? tcp_v4_do_rcv+0x10f/0x4a0
 [] ? prepare_to_wait+0x50/0xa0
 [] ? prepare_to_wait+0x50/0xa0
 [] __might_sleep+0x5b/0x1e0
 [] release_sock+0x13d/0x200
 [] sk_stream_wait_memory+0x133/0x2d0
 [] ? woken_wake_function+0x10/0x10
 [] tcp_sendmsg+0xb6f/0xd70
 [] inet_sendmsg+0xdf/0x100
 [] ? inet_recvmsg+0x100/0x100
 [] sock_sendmsg+0x67/0x90
 [] ? might_fault+0x51/0xb0
 [] ___sys_sendmsg+0x2d2/0x2e0
 [] ? futex_wake+0x128/0x140
 [] ? futex_wake+0x1/0x140
 [] ? do_dup2+0xd0/0xd0
 [] ? get_parent_ip+0x11/0x50
 [] ? debug_smp_processor_id+0x17/0x20
 [] ? delay_tsc+0x85/0xb0
 [] ? __fget+0xdd/0xf0
 [] ? do_dup2+0xd0/0xd0
 [] ? __fget_light+0x45/0x60
 [] ? __fdget+0xe/0x10
 [] __sys_sendmsg+0x44/0x70
 [] SyS_sendmsg+0x9/0x10
 [] system_call_fastpath+0x16/0x1b

WARNING: CPU: 0 PID: 380 at kernel/sched/core.c:7094 __might_sleep+0x5b/0x1e0()
do not call blocking ops when !TASK_RUNNING; state=1 set at
[] prepare_to_wait+0x50/0xa0
Modules linked in:
CPU: 0 PID: 380 Comm: kswapd0 Tainted: GW 3.16.0-vm+ #109
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
 1bb6 88007b64bc68 8156ec1d 
 88007b64bcb8 88007b64bca8 81032c27 
  88007c062060 0065 8179ca1f
Call Trace:
 [] dump_stack+0x4f/0x7c
 [] warn_slowpath_common+0x87/0xb0
 [] warn_slowpath_fmt+0x41/0x50
 [] ? prepare_to_wait+0x50/0xa0
 [] ? prepare_to_wait+0x50/0xa0
 [] __might_sleep+0x5b/0x1e0
 [] __reset_isolation_suitable+0x83/0x140
 [] reset_isolation_suitable+0x33/0x50
 [] kswapd+0x2e7/0x4d0
 [] ? woken_wake_function+0x10/0x10
 [] ? balance_pgdat+0x5b0/0x5b0
 [] kthread+0xfb/0x110
 [] ? flush_kthread_worker+0x130/0x130
 [] ret_from_fork+0x7c/0xb0
 [] ? flush_kthread_worker+0x130/0x130

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/