Re: [tip:timers/urgent] timekeeping: Fix HRTICK related deadlock from ntp lock changes

2013-09-17 Thread Mathieu Desnoyers
* Ingo Molnar (mi...@kernel.org) wrote:
> 
> * Mathieu Desnoyers  wrote:
> 
> > * Ingo Molnar (mi...@kernel.org) wrote:
> > > 
> > > * Mathieu Desnoyers  wrote:
> > > 
> > > > Hi Ingo,
> > > > 
> > > > Do you have an estimate of the time it will take for this fix to hit 
> > > > mainline, stable-3.10 and stable-3.11 ? Meanwhile, I'm marking 3.10 and 
> > > > 3.11 as broken for LTTng with a kernel version at compile-time, since 
> > > > this kernel regression currently triggers hard system lockup when 
> > > > people 
> > > > use LTTng on those kernels, and this is certainly something nobody 
> > > > wants.
> > > 
> > > So, at least as per the description of John, this should only trigger if 
> > > SCHED_HRTICK is enabled in sched_features - which is disabled by default, 
> > > it's a debug-only development feature. Does the bug trigger on more 
> > > regular kernels as well?
> > 
> > Unfortunately, it does happen on a pretty standard kernel config (giving
> > my x230 config as example below). Pasting relevant bug description from
> > http://bugs.lttng.org/issues/631 :
> > 
> > "Starting from Linux kernel commit
> > 06c017fdd4dc48451a29ac37fc1db4a3f86b7f40 "timekeeping: Hold
> > timekeepering locks in do_adjtimex and hardpps" (3.10 kernels), the
> > xtime write seqlock is held across calls to __do_adjtimex(), which
> > includes a call to notify_cmos_timer(), and hence
> > schedule_delayed_work().
> > 
> > This introduces a side-effect for a set of tracepoints, including mainly 
> > the workqueue tracepoints: a tracer hooking on those tracepoints and 
> > reading current time with ktime_get() will cause hard system LOCKUP"
> 
> It's the LTTng tracepoint 'hooking' in something that does something 
> invalid in that context that is causing the hang, not the vanilla kernel 
> itself, right?

Yes, that's correct. In order to ensure this kind of problem is entirely
taken care of, I've started working on a synchronization scheme proposed
by Peter Zijlstra that would allow ktime() to be called from any
execution context (see:
http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg504089.html).

> 
> In that case the 'you get to keep both pieces' policy of out of tree code 
> applies - but the HRTICK fix should solve your problem as well, 
> incidentally.

Thanks,

Mathieu

> 
> Thanks,
> 
>   Ingo

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [tip:timers/urgent] timekeeping: Fix HRTICK related deadlock from ntp lock changes

2013-09-17 Thread Ingo Molnar

* Mathieu Desnoyers  wrote:

> * Ingo Molnar (mi...@kernel.org) wrote:
> > 
> > * Mathieu Desnoyers  wrote:
> > 
> > > Hi Ingo,
> > > 
> > > Do you have an estimate of the time it will take for this fix to hit 
> > > mainline, stable-3.10 and stable-3.11 ? Meanwhile, I'm marking 3.10 and 
> > > 3.11 as broken for LTTng with a kernel version at compile-time, since 
> > > this kernel regression currently triggers hard system lockup when people 
> > > use LTTng on those kernels, and this is certainly something nobody 
> > > wants.
> > 
> > So, at least as per the description of John, this should only trigger if 
> > SCHED_HRTICK is enabled in sched_features - which is disabled by default, 
> > it's a debug-only development feature. Does the bug trigger on more 
> > regular kernels as well?
> 
> Unfortunately, it does happen on a pretty standard kernel config (giving
> my x230 config as example below). Pasting relevant bug description from
> http://bugs.lttng.org/issues/631 :
> 
> "Starting from Linux kernel commit
> 06c017fdd4dc48451a29ac37fc1db4a3f86b7f40 "timekeeping: Hold
> timekeepering locks in do_adjtimex and hardpps" (3.10 kernels), the
> xtime write seqlock is held across calls to __do_adjtimex(), which
> includes a call to notify_cmos_timer(), and hence
> schedule_delayed_work().
> 
> This introduces a side-effect for a set of tracepoints, including mainly 
> the workqueue tracepoints: a tracer hooking on those tracepoints and 
> reading current time with ktime_get() will cause hard system LOCKUP"

It's the LTTng tracepoint 'hooking' in something that does something 
invalid in that context that is causing the hang, not the vanilla kernel 
itself, right?

In that case the 'you get to keep both pieces' policy of out of tree code 
applies - but the HRTICK fix should solve your problem as well, 
incidentally.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [tip:timers/urgent] timekeeping: Fix HRTICK related deadlock from ntp lock changes

2013-09-17 Thread Ingo Molnar

* Mathieu Desnoyers  wrote:

> Hi Ingo,
> 
> Do you have an estimate of the time it will take for this fix to hit 
> mainline, stable-3.10 and stable-3.11 ? Meanwhile, I'm marking 3.10 and 
> 3.11 as broken for LTTng with a kernel version at compile-time, since 
> this kernel regression currently triggers hard system lockup when people 
> use LTTng on those kernels, and this is certainly something nobody 
> wants.

So, at least as per the description of John, this should only trigger if 
SCHED_HRTICK is enabled in sched_features - which is disabled by default, 
it's a debug-only development feature. Does the bug trigger on more 
regular kernels as well?

I planned to send it Linus after v3.12-rc1, in the next day or two.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [tip:timers/urgent] timekeeping: Fix HRTICK related deadlock from ntp lock changes

2013-09-17 Thread Ingo Molnar

* Mathieu Desnoyers mathieu.desnoy...@efficios.com wrote:

 Hi Ingo,
 
 Do you have an estimate of the time it will take for this fix to hit 
 mainline, stable-3.10 and stable-3.11 ? Meanwhile, I'm marking 3.10 and 
 3.11 as broken for LTTng with a kernel version at compile-time, since 
 this kernel regression currently triggers hard system lockup when people 
 use LTTng on those kernels, and this is certainly something nobody 
 wants.

So, at least as per the description of John, this should only trigger if 
SCHED_HRTICK is enabled in sched_features - which is disabled by default, 
it's a debug-only development feature. Does the bug trigger on more 
regular kernels as well?

I planned to send it Linus after v3.12-rc1, in the next day or two.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [tip:timers/urgent] timekeeping: Fix HRTICK related deadlock from ntp lock changes

2013-09-17 Thread Ingo Molnar

* Mathieu Desnoyers mathieu.desnoy...@efficios.com wrote:

 * Ingo Molnar (mi...@kernel.org) wrote:
  
  * Mathieu Desnoyers mathieu.desnoy...@efficios.com wrote:
  
   Hi Ingo,
   
   Do you have an estimate of the time it will take for this fix to hit 
   mainline, stable-3.10 and stable-3.11 ? Meanwhile, I'm marking 3.10 and 
   3.11 as broken for LTTng with a kernel version at compile-time, since 
   this kernel regression currently triggers hard system lockup when people 
   use LTTng on those kernels, and this is certainly something nobody 
   wants.
  
  So, at least as per the description of John, this should only trigger if 
  SCHED_HRTICK is enabled in sched_features - which is disabled by default, 
  it's a debug-only development feature. Does the bug trigger on more 
  regular kernels as well?
 
 Unfortunately, it does happen on a pretty standard kernel config (giving
 my x230 config as example below). Pasting relevant bug description from
 http://bugs.lttng.org/issues/631 :
 
 Starting from Linux kernel commit
 06c017fdd4dc48451a29ac37fc1db4a3f86b7f40 timekeeping: Hold
 timekeepering locks in do_adjtimex and hardpps (3.10 kernels), the
 xtime write seqlock is held across calls to __do_adjtimex(), which
 includes a call to notify_cmos_timer(), and hence
 schedule_delayed_work().
 
 This introduces a side-effect for a set of tracepoints, including mainly 
 the workqueue tracepoints: a tracer hooking on those tracepoints and 
 reading current time with ktime_get() will cause hard system LOCKUP

It's the LTTng tracepoint 'hooking' in something that does something 
invalid in that context that is causing the hang, not the vanilla kernel 
itself, right?

In that case the 'you get to keep both pieces' policy of out of tree code 
applies - but the HRTICK fix should solve your problem as well, 
incidentally.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [tip:timers/urgent] timekeeping: Fix HRTICK related deadlock from ntp lock changes

2013-09-17 Thread Mathieu Desnoyers
* Ingo Molnar (mi...@kernel.org) wrote:
 
 * Mathieu Desnoyers mathieu.desnoy...@efficios.com wrote:
 
  * Ingo Molnar (mi...@kernel.org) wrote:
   
   * Mathieu Desnoyers mathieu.desnoy...@efficios.com wrote:
   
Hi Ingo,

Do you have an estimate of the time it will take for this fix to hit 
mainline, stable-3.10 and stable-3.11 ? Meanwhile, I'm marking 3.10 and 
3.11 as broken for LTTng with a kernel version at compile-time, since 
this kernel regression currently triggers hard system lockup when 
people 
use LTTng on those kernels, and this is certainly something nobody 
wants.
   
   So, at least as per the description of John, this should only trigger if 
   SCHED_HRTICK is enabled in sched_features - which is disabled by default, 
   it's a debug-only development feature. Does the bug trigger on more 
   regular kernels as well?
  
  Unfortunately, it does happen on a pretty standard kernel config (giving
  my x230 config as example below). Pasting relevant bug description from
  http://bugs.lttng.org/issues/631 :
  
  Starting from Linux kernel commit
  06c017fdd4dc48451a29ac37fc1db4a3f86b7f40 timekeeping: Hold
  timekeepering locks in do_adjtimex and hardpps (3.10 kernels), the
  xtime write seqlock is held across calls to __do_adjtimex(), which
  includes a call to notify_cmos_timer(), and hence
  schedule_delayed_work().
  
  This introduces a side-effect for a set of tracepoints, including mainly 
  the workqueue tracepoints: a tracer hooking on those tracepoints and 
  reading current time with ktime_get() will cause hard system LOCKUP
 
 It's the LTTng tracepoint 'hooking' in something that does something 
 invalid in that context that is causing the hang, not the vanilla kernel 
 itself, right?

Yes, that's correct. In order to ensure this kind of problem is entirely
taken care of, I've started working on a synchronization scheme proposed
by Peter Zijlstra that would allow ktime() to be called from any
execution context (see:
http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg504089.html).

 
 In that case the 'you get to keep both pieces' policy of out of tree code 
 applies - but the HRTICK fix should solve your problem as well, 
 incidentally.

Thanks,

Mathieu

 
 Thanks,
 
   Ingo

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [tip:timers/urgent] timekeeping: Fix HRTICK related deadlock from ntp lock changes

2013-09-16 Thread Mathieu Desnoyers
Hi Ingo,

Do you have an estimate of the time it will take for this fix to hit
mainline, stable-3.10 and stable-3.11 ? Meanwhile, I'm marking 3.10 and
3.11 as broken for LTTng with a kernel version at compile-time, since
this kernel regression currently triggers hard system lockup when people
use LTTng on those kernels, and this is certainly something nobody
wants.

Thanks,

Mathieu

* tip-bot for John Stultz (tip...@zytor.com) wrote:
> Commit-ID:  7bd36014460f793c19e7d6c94dab67b0afcfcb7f
> Gitweb: http://git.kernel.org/tip/7bd36014460f793c19e7d6c94dab67b0afcfcb7f
> Author: John Stultz 
> AuthorDate: Wed, 11 Sep 2013 16:50:56 -0700
> Committer:  Ingo Molnar 
> CommitDate: Thu, 12 Sep 2013 07:49:51 +0200
> 
> timekeeping: Fix HRTICK related deadlock from ntp lock changes
> 
> Gerlando Falauto reported that when HRTICK is enabled, it is
> possible to trigger system deadlocks. These were hard to
> reproduce, as HRTICK has been broken in the past, but seemed
> to be connected to the timekeeping_seq lock.
> 
> Since seqlock/seqcount's aren't supported w/ lockdep, I added
> some extra spinlock based locking and triggered the following
> lockdep output:
> 
> [   15.849182] ntpd/4062 is trying to acquire lock:
> [   15.849765]  (&(>lock)->rlock){..-...}, at: [] 
> __queue_work+0x145/0x480
> [   15.850051]
> [   15.850051] but task is already holding lock:
> [   15.850051]  (timekeeper_lock){-.-.-.}, at: [] 
> do_adjtimex+0x7f/0x100
> 
> 
> 
> [   15.850051] Chain exists of: &(>lock)->rlock --> >pi_lock --> 
> timekeeper_lock
> [   15.850051]  Possible unsafe locking scenario:
> [   15.850051]
> [   15.850051]CPU0CPU1
> [   15.850051]
> [   15.850051]   lock(timekeeper_lock);
> [   15.850051]lock(>pi_lock);
> [   15.850051] lock(timekeeper_lock);
> [   15.850051] lock(&(>lock)->rlock);
> [   15.850051]
> [   15.850051]  *** DEADLOCK ***
> 
> The deadlock was introduced by 06c017fdd4dc48451a ("timekeeping:
> Hold timekeepering locks in do_adjtimex and hardpps") in 3.10
> 
> This patch avoids this deadlock, by moving the call to
> schedule_delayed_work() outside of the timekeeper lock
> critical section.
> 
> Reported-by: Gerlando Falauto 
> Tested-by: Lin Ming 
> Signed-off-by: John Stultz 
> Cc: Mathieu Desnoyers 
> Cc: stable  #3.11, 3.10
> Link: 
> http://lkml.kernel.org/r/1378943457-27314-1-git-send-email-john.stu...@linaro.org
> Signed-off-by: Ingo Molnar 
> ---
>  include/linux/timex.h | 1 +
>  kernel/time/ntp.c | 6 ++
>  kernel/time/timekeeping.c | 2 ++
>  3 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/timex.h b/include/linux/timex.h
> index b3726e6..dd3edd7 100644
> --- a/include/linux/timex.h
> +++ b/include/linux/timex.h
> @@ -141,6 +141,7 @@ extern int do_adjtimex(struct timex *);
>  extern void hardpps(const struct timespec *, const struct timespec *);
>  
>  int read_current_timer(unsigned long *timer_val);
> +void ntp_notify_cmos_timer(void);
>  
>  /* The clock frequency of the i8253/i8254 PIT */
>  #define PIT_TICK_RATE 1193182ul
> diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
> index 8f5b3b9..bb22151 100644
> --- a/kernel/time/ntp.c
> +++ b/kernel/time/ntp.c
> @@ -516,13 +516,13 @@ static void sync_cmos_clock(struct work_struct *work)
>   schedule_delayed_work(_cmos_work, timespec_to_jiffies());
>  }
>  
> -static void notify_cmos_timer(void)
> +void ntp_notify_cmos_timer(void)
>  {
>   schedule_delayed_work(_cmos_work, 0);
>  }
>  
>  #else
> -static inline void notify_cmos_timer(void) { }
> +void ntp_notify_cmos_timer(void) { }
>  #endif
>  
>  
> @@ -687,8 +687,6 @@ int __do_adjtimex(struct timex *txc, struct timespec *ts, 
> s32 *time_tai)
>   if (!(time_status & STA_NANO))
>   txc->time.tv_usec /= NSEC_PER_USEC;
>  
> - notify_cmos_timer();
> -
>   return result;
>  }
>  
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 48b9fff..947ba25 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -1703,6 +1703,8 @@ int do_adjtimex(struct timex *txc)
>   write_seqcount_end(_seq);
>   raw_spin_unlock_irqrestore(_lock, flags);
>  
> + ntp_notify_cmos_timer();
> +
>   return ret;
>  }
>  

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [tip:timers/urgent] timekeeping: Fix HRTICK related deadlock from ntp lock changes

2013-09-16 Thread Mathieu Desnoyers
Hi Ingo,

Do you have an estimate of the time it will take for this fix to hit
mainline, stable-3.10 and stable-3.11 ? Meanwhile, I'm marking 3.10 and
3.11 as broken for LTTng with a kernel version at compile-time, since
this kernel regression currently triggers hard system lockup when people
use LTTng on those kernels, and this is certainly something nobody
wants.

Thanks,

Mathieu

* tip-bot for John Stultz (tip...@zytor.com) wrote:
 Commit-ID:  7bd36014460f793c19e7d6c94dab67b0afcfcb7f
 Gitweb: http://git.kernel.org/tip/7bd36014460f793c19e7d6c94dab67b0afcfcb7f
 Author: John Stultz john.stu...@linaro.org
 AuthorDate: Wed, 11 Sep 2013 16:50:56 -0700
 Committer:  Ingo Molnar mi...@kernel.org
 CommitDate: Thu, 12 Sep 2013 07:49:51 +0200
 
 timekeeping: Fix HRTICK related deadlock from ntp lock changes
 
 Gerlando Falauto reported that when HRTICK is enabled, it is
 possible to trigger system deadlocks. These were hard to
 reproduce, as HRTICK has been broken in the past, but seemed
 to be connected to the timekeeping_seq lock.
 
 Since seqlock/seqcount's aren't supported w/ lockdep, I added
 some extra spinlock based locking and triggered the following
 lockdep output:
 
 [   15.849182] ntpd/4062 is trying to acquire lock:
 [   15.849765]  ((pool-lock)-rlock){..-...}, at: [810aa9b5] 
 __queue_work+0x145/0x480
 [   15.850051]
 [   15.850051] but task is already holding lock:
 [   15.850051]  (timekeeper_lock){-.-.-.}, at: [810df6df] 
 do_adjtimex+0x7f/0x100
 
 snip
 
 [   15.850051] Chain exists of: (pool-lock)-rlock -- p-pi_lock -- 
 timekeeper_lock
 [   15.850051]  Possible unsafe locking scenario:
 [   15.850051]
 [   15.850051]CPU0CPU1
 [   15.850051]
 [   15.850051]   lock(timekeeper_lock);
 [   15.850051]lock(p-pi_lock);
 [   15.850051] lock(timekeeper_lock);
 [   15.850051] lock((pool-lock)-rlock);
 [   15.850051]
 [   15.850051]  *** DEADLOCK ***
 
 The deadlock was introduced by 06c017fdd4dc48451a (timekeeping:
 Hold timekeepering locks in do_adjtimex and hardpps) in 3.10
 
 This patch avoids this deadlock, by moving the call to
 schedule_delayed_work() outside of the timekeeper lock
 critical section.
 
 Reported-by: Gerlando Falauto gerlando.fala...@keymile.com
 Tested-by: Lin Ming min...@gmail.com
 Signed-off-by: John Stultz john.stu...@linaro.org
 Cc: Mathieu Desnoyers mathieu.desnoy...@efficios.com
 Cc: stable sta...@vger.kernel.org #3.11, 3.10
 Link: 
 http://lkml.kernel.org/r/1378943457-27314-1-git-send-email-john.stu...@linaro.org
 Signed-off-by: Ingo Molnar mi...@kernel.org
 ---
  include/linux/timex.h | 1 +
  kernel/time/ntp.c | 6 ++
  kernel/time/timekeeping.c | 2 ++
  3 files changed, 5 insertions(+), 4 deletions(-)
 
 diff --git a/include/linux/timex.h b/include/linux/timex.h
 index b3726e6..dd3edd7 100644
 --- a/include/linux/timex.h
 +++ b/include/linux/timex.h
 @@ -141,6 +141,7 @@ extern int do_adjtimex(struct timex *);
  extern void hardpps(const struct timespec *, const struct timespec *);
  
  int read_current_timer(unsigned long *timer_val);
 +void ntp_notify_cmos_timer(void);
  
  /* The clock frequency of the i8253/i8254 PIT */
  #define PIT_TICK_RATE 1193182ul
 diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
 index 8f5b3b9..bb22151 100644
 --- a/kernel/time/ntp.c
 +++ b/kernel/time/ntp.c
 @@ -516,13 +516,13 @@ static void sync_cmos_clock(struct work_struct *work)
   schedule_delayed_work(sync_cmos_work, timespec_to_jiffies(next));
  }
  
 -static void notify_cmos_timer(void)
 +void ntp_notify_cmos_timer(void)
  {
   schedule_delayed_work(sync_cmos_work, 0);
  }
  
  #else
 -static inline void notify_cmos_timer(void) { }
 +void ntp_notify_cmos_timer(void) { }
  #endif
  
  
 @@ -687,8 +687,6 @@ int __do_adjtimex(struct timex *txc, struct timespec *ts, 
 s32 *time_tai)
   if (!(time_status  STA_NANO))
   txc-time.tv_usec /= NSEC_PER_USEC;
  
 - notify_cmos_timer();
 -
   return result;
  }
  
 diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
 index 48b9fff..947ba25 100644
 --- a/kernel/time/timekeeping.c
 +++ b/kernel/time/timekeeping.c
 @@ -1703,6 +1703,8 @@ int do_adjtimex(struct timex *txc)
   write_seqcount_end(timekeeper_seq);
   raw_spin_unlock_irqrestore(timekeeper_lock, flags);
  
 + ntp_notify_cmos_timer();
 +
   return ret;
  }
  

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/