subject:"linux\-next\: Tree for Feb 4"

On Fri, Feb 6, 2015 at 1:12 AM, Steven Rostedt  wrote:
> On Fri, 6 Feb 2015 00:53:41 +0100
> Sedat Dilek  wrote:
>
>> > See that if (IS_ENABLED(CONFIG_LOCKDEP))?
>> >
>>
>> I have here...
>>
>> CONFIG_LOCKDEP=y
>
> Yep, I knew that (you wouldn't get splats without it).
>
>
>> Which old patch?
>> "tlb: Don't do trace_tlb_flush() on offline CPUs" ?
>
> Yeah, that one. In other words, just add this patch on the kernel you
> just tested.
>
> Thanks,
>

Do you have a name with label for your patch?

- Sedat -
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Fri, 6 Feb 2015 00:53:41 +0100
Sedat Dilek  wrote:

> > See that if (IS_ENABLED(CONFIG_LOCKDEP))?
> >
> 
> I have here...
> 
> CONFIG_LOCKDEP=y

Yep, I knew that (you wouldn't get splats without it).


> Which old patch?
> "tlb: Don't do trace_tlb_flush() on offline CPUs" ?

Yeah, that one. In other words, just add this patch on the kernel you
just tested.

Thanks,

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

[...]
>> That said, let's add this (on top of the old patch):
>>
>
> Which old patch?
> "tlb: Don't do trace_tlb_flush() on offline CPUs" ?
>

Or did you mean "x86/mm: Omit switch_mm() tracing for offline CPUs"

- Sedat -
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Fri, Feb 6, 2015 at 12:11 AM, Steven Rostedt  wrote:
> On Thu, 5 Feb 2015 23:16:21 +0100
> Sedat Dilek  wrote:
>
>> On Thu, Feb 5, 2015 at 11:09 PM, Steven Rostedt  wrote:
>> > On Thu, 5 Feb 2015 22:45:59 +0100
>> > Sedat Dilek  wrote:
>> >
>> >> Steve, this was a typo it's called tlb_flush not tlb_flush*ed*:
>> >
>> > Heh, yeah, I typed that entire line in by hand. Just be lucky that was
>> > the only typo ;-)
>> >
>> >>
>> >> # cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/enable
>> >> 1
>> >>
>> >> [  391.090381] intel_pstate CPU 1 exiting
>> >> [  391.104491] smpboot: CPU 1 is now offline
>> >>
>> >
>> > Now, if you disable that (echo 0 to that file), do you still get the
>> > rcu lockdep splat if you suspend and resume?
>> >
>>
>> YES, I get the call-trace again!
>>
>
> Bah! I see where the warning comes from. In include/linux/tracepoint.h
> we have:
>
> #define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \
> extern struct tracepoint __tracepoint_##name;   \
> static inline void trace_##name(proto)  \
> {   \
> if (static_key_false(&__tracepoint_##name.key)) \
> __DO_TRACE(&__tracepoint_##name,\
> TP_PROTO(data_proto),   \
> TP_ARGS(data_args), \
> TP_CONDITION(cond),,);  \
> if (IS_ENABLED(CONFIG_LOCKDEP)) {   \
> rcu_read_lock_sched_notrace();  \
> rcu_dereference_sched(__tracepoint_##name.funcs);\
> rcu_read_unlock_sched_notrace();\
> }   \
> }   \
>
> See that if (IS_ENABLED(CONFIG_LOCKDEP))?
>

I have here...

CONFIG_LOCKDEP=y

- Sedat -

> I'm recalling this. Because tracepoints require RCU, and RCU lockdep
> doesn't trigger if a tracepoint isn't enabled (because the rcu calls
> are hidden in the __DO_TRACE() behind that static_key_false), we would
> be missing lots of rcu problem tracepoints because tests were run
> without them enabled.
>
> The answer was to add this rcu check when LOCKDEP was enabled. So no,
> adding that conditional isn't going to help, because lockdep will
> trigger here, even if it were safe because of the conditional :-/.
>
> That said, let's add this (on top of the old patch):
>

Which old patch?
"tlb: Don't do trace_tlb_flush() on offline CPUs" ?

- Sedat -

> (again, not tested)
>
> Signed-off-by: Steven Rostedt 
> ---
> diff --git a/arch/x86/include/asm/mmu_context.h 
> b/arch/x86/include/asm/mmu_context.h
> index 4b75d591eb5e..401b5bfbcdbd 100644
> --- a/arch/x86/include/asm/mmu_context.h
> +++ b/arch/x86/include/asm/mmu_context.h
> @@ -47,7 +47,12 @@ static inline void switch_mm(struct mm_struct *prev, 
> struct mm_struct *next,
>
> /* Re-load page tables */
> load_cr3(next->pgd);
> -   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
> +   /*
> +* Do not check rcu when tracing is not enabled. The
> +* tracepoint has a condition to not trace if the CPU is
> +* offline, and rcu check will complain if it is.
> +*/
> +   trace_tlb_flush_rcu_nocheck(TLB_FLUSH_ON_TASK_SWITCH, 
> TLB_FLUSH_ALL);
>
> /* Stop flush ipis for the previous mm */
> cpumask_clear_cpu(cpu, mm_cpumask(prev));
> @@ -84,7 +89,13 @@ static inline void switch_mm(struct mm_struct *prev, 
> struct mm_struct *next,
>  * to make sure to use no freed page tables.
>  */
> load_cr3(next->pgd);
> -   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 
> TLB_FLUSH_ALL);
> +   /*
> +* Do not check rcu when tracing is not enabled. The
> +* tracepoint has a condition to not trace if the CPU 
> is
> +* offline, and rcu check will complain if it is.
> +*/
> +   trace_tlb_flush_rcu_nocheck(TLB_FLUSH_ON_TASK_SWITCH,
> +   TLB_FLUSH_ALL);
> load_LDT_nolock(>context);
> }
> }
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index e08e21e5f601..747a05aceb60 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -179,6 +179,14 @@ extern void syscall_unregfunc(void);
> rcu_read_unlock_sched_notrace();\
> }

Re: linux-next: Tree for Feb 4

On Thu, 5 Feb 2015 23:16:21 +0100
Sedat Dilek  wrote:

> On Thu, Feb 5, 2015 at 11:09 PM, Steven Rostedt  wrote:
> > On Thu, 5 Feb 2015 22:45:59 +0100
> > Sedat Dilek  wrote:
> >
> >> Steve, this was a typo it's called tlb_flush not tlb_flush*ed*:
> >
> > Heh, yeah, I typed that entire line in by hand. Just be lucky that was
> > the only typo ;-)
> >
> >>
> >> # cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/enable
> >> 1
> >>
> >> [  391.090381] intel_pstate CPU 1 exiting
> >> [  391.104491] smpboot: CPU 1 is now offline
> >>
> >
> > Now, if you disable that (echo 0 to that file), do you still get the
> > rcu lockdep splat if you suspend and resume?
> >
> 
> YES, I get the call-trace again!
> 

Bah! I see where the warning comes from. In include/linux/tracepoint.h
we have:

#define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \
extern struct tracepoint __tracepoint_##name;   \
static inline void trace_##name(proto)  \
{   \
if (static_key_false(&__tracepoint_##name.key)) \
__DO_TRACE(&__tracepoint_##name,\
TP_PROTO(data_proto),   \
TP_ARGS(data_args), \
TP_CONDITION(cond),,);  \
if (IS_ENABLED(CONFIG_LOCKDEP)) {   \
rcu_read_lock_sched_notrace();  \
rcu_dereference_sched(__tracepoint_##name.funcs);\
rcu_read_unlock_sched_notrace();\
}   \
}   \

See that if (IS_ENABLED(CONFIG_LOCKDEP))?

I'm recalling this. Because tracepoints require RCU, and RCU lockdep
doesn't trigger if a tracepoint isn't enabled (because the rcu calls
are hidden in the __DO_TRACE() behind that static_key_false), we would
be missing lots of rcu problem tracepoints because tests were run
without them enabled.

The answer was to add this rcu check when LOCKDEP was enabled. So no,
adding that conditional isn't going to help, because lockdep will
trigger here, even if it were safe because of the conditional :-/.

That said, let's add this (on top of the old patch):

(again, not tested)

Signed-off-by: Steven Rostedt 
---
diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index 4b75d591eb5e..401b5bfbcdbd 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -47,7 +47,12 @@ static inline void switch_mm(struct mm_struct *prev, struct 
mm_struct *next,
 
/* Re-load page tables */
load_cr3(next->pgd);
-   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
+   /*
+* Do not check rcu when tracing is not enabled. The
+* tracepoint has a condition to not trace if the CPU is
+* offline, and rcu check will complain if it is.
+*/
+   trace_tlb_flush_rcu_nocheck(TLB_FLUSH_ON_TASK_SWITCH, 
TLB_FLUSH_ALL);
 
/* Stop flush ipis for the previous mm */
cpumask_clear_cpu(cpu, mm_cpumask(prev));
@@ -84,7 +89,13 @@ static inline void switch_mm(struct mm_struct *prev, struct 
mm_struct *next,
 * to make sure to use no freed page tables.
 */
load_cr3(next->pgd);
-   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 
TLB_FLUSH_ALL);
+   /*
+* Do not check rcu when tracing is not enabled. The
+* tracepoint has a condition to not trace if the CPU is
+* offline, and rcu check will complain if it is.
+*/
+   trace_tlb_flush_rcu_nocheck(TLB_FLUSH_ON_TASK_SWITCH,
+   TLB_FLUSH_ALL);
load_LDT_nolock(>context);
}
}
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index e08e21e5f601..747a05aceb60 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -179,6 +179,14 @@ extern void syscall_unregfunc(void);
rcu_read_unlock_sched_notrace();\
}   \
}   \
+   static inline void trace_##name##_rcu_nocheck(proto)\
+   {   \
+   if (static_key_false(&__tracepoint_##name.key)) \
+

Re: linux-next: Tree for Feb 4

On Thu, Feb 5, 2015 at 11:09 PM, Steven Rostedt  wrote:
> On Thu, 5 Feb 2015 22:45:59 +0100
> Sedat Dilek  wrote:
>
>> Steve, this was a typo it's called tlb_flush not tlb_flush*ed*:
>
> Heh, yeah, I typed that entire line in by hand. Just be lucky that was
> the only typo ;-)
>
>>
>> # cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/enable
>> 1
>>
>> [  391.090381] intel_pstate CPU 1 exiting
>> [  391.104491] smpboot: CPU 1 is now offline
>>
>
> Now, if you disable that (echo 0 to that file), do you still get the
> rcu lockdep splat if you suspend and resume?
>

YES, I get the call-trace again!

# cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/enable
0

# cat /sys/devices/system/cpu/cpu1/online
0

[ 2470.606222] intel_pstate CPU 1 exiting
[ 2470.628153]
[ 2470.628155] ===
[ 2470.628156] [ INFO: suspicious RCU usage. ]
[ 2470.628159] 3.19.0-rc7-next-20150204.9-iniza-small #1 Not tainted
[ 2470.628160] ---
[ 2470.628162] include/trace/events/tlb.h:37 suspicious
rcu_dereference_check() usage!
[ 2470.628163]
[ 2470.628163] other info that might help us debug this:
[ 2470.628163]
[ 2470.628164]
[ 2470.628164] RCU used illegally from offline CPU!
[ 2470.628164] rcu_scheduler_active = 1, debug_locks = 0
[ 2470.628165] no locks held by swapper/1/0.
[ 2470.628166]
[ 2470.628166] stack backtrace:
[ 2470.628169] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
3.19.0-rc7-next-20150204.9-iniza-small #1
[ 2470.628171] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
[ 2470.628176]  0001 88011a44fe18 817ecf4d
0011
[ 2470.628179]  88011a448290 88011a44fe48 810d6b57
8800cc2660c0
[ 2470.628182]  0001 81d35160 0002
88011a44fe78
[ 2470.628183] Call Trace:
[ 2470.628192]  [] dump_stack+0x4c/0x65
[ 2470.628198]  [] lockdep_rcu_suspicious+0xe7/0x120
[ 2470.628203]  [] idle_task_exit+0x1c9/0x260
[ 2470.628208]  [] play_dead_common+0xe/0x50
[ 2470.628211]  [] native_play_dead+0x15/0x140
[ 2470.628216]  [] arch_cpu_idle_dead+0xf/0x20
[ 2470.628219]  [] cpu_startup_entry+0x37e/0x580
[ 2470.628222] smpboot: CPU 1 didn't die...
[ 2470.628224]  [] start_secondary+0x140/0x150

- Sedat -
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Thu, 5 Feb 2015 22:45:59 +0100
Sedat Dilek  wrote:

> Steve, this was a typo it's called tlb_flush not tlb_flush*ed*:

Heh, yeah, I typed that entire line in by hand. Just be lucky that was
the only typo ;-)

> 
> # cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/enable
> 1
> 
> [  391.090381] intel_pstate CPU 1 exiting
> [  391.104491] smpboot: CPU 1 is now offline
> 

Now, if you disable that (echo 0 to that file), do you still get the
rcu lockdep splat if you suspend and resume?

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

[...]
>>> >> Unfortunately, the call-trace remains when doing an offlining of cpu1.
>>> >> ( It's good to see it's reproducible. )
>>> >
>>> > Was the tracepoint enabled? Or was there some other rcu call that
>>> > triggered this. Or would cpu_online(smp_processor_id()) return true at
>>> > this point?
>>> >
>>>
>>> Thanks Steve for jumping into this one!
>>>
>>> Good point.
>>> I looked at my kernel-config (which I already sent :-)).
>>>
>>> Do I need to enable...?
>>>
>>> # CONFIG_RCU_TRACE is not set
>>>
>>> ...or even more?
>>>
>>
>> What I meant by the tracepoint being enabled, was not that it was
>> configured in (I'm assuming it was), but that you started tracing?
>>
>>  echo 1 > /sys/kernel/debug/tracing/events/enable
>>
>> or
>>
>>  echo 1 > /sys/kernel/debug/tracing/events/tlb/tlb_flushed/enable
>>
>
> NO, I did not start any tracing before doing my testing.
>
> # cat /sys/kernel/debug/tracing/events/enable
> 0
>
> # echo 1 > /sys/kernel/debug/tracing/events/enable
>
> # cat /sys/kernel/debug/tracing/events/enable
> X
>
> # LC_ALL=C cat /sys/kernel/debug/tracing/events/tlb/tlb_flushed/enable
> cat: /sys/kernel/debug/tracing/events/tlb/tlb_flushed/enable: No such
> file or directory
>
> Looks like I need to enable...?
>
> # CONFIG_DEBUG_TLBFLUSH is not set
>

Here my new kernel-config (not sure if I really need them to be enabled):

$ ./scripts/diffconfig
/boot/config-3.19.0-rc7-next-20150204.7-iniza-small
/boot/config-3.19.0-rc7-next-20150204.9-iniza-small
 DEBUG_TLBFLUSH n -> y
 RCU_TRACE n -> y
 TREE_RCU_TRACE n -> y

Steve, this was a typo it's called tlb_flush not tlb_flush*ed*:

# cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/enable
1

[  391.090381] intel_pstate CPU 1 exiting
[  391.104491] smpboot: CPU 1 is now offline

- Sedat -
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Thu, Feb 5, 2015 at 9:22 PM, Steven Rostedt  wrote:
> On Thu, 5 Feb 2015 21:07:27 +0100
> Sedat Dilek  wrote:
>
>> > Is this Paul's version of the patch or mine? If it is just mine, do you
>> > know if Paul's version triggers this too?
>> >
>>
>> This one which entered Pauls rcu-next tree.
>>
>> [1] 
>> http://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=rcu/next=2b27cf7317d8a99a50bead9faccd54b46b6f0c41
>
> That's mine.
>
> It looks like the condition will be tested before it calls and rcu
> code. Which is why I was confused that it still gave a splat. Paul
> posted a patch before this that did the check outside the trace point.
>
> This one:
>
>  http://marc.info/?l=linux-kernel=142310961217650=2
>
>>
>> >> ( I did not build from scratch but re-invoking make "updated" the
>> >> files touched by Steven's patch, see attached build-log. )
>> >>
>> >> Unfortunately, the call-trace remains when doing an offlining of cpu1.
>> >> ( It's good to see it's reproducible. )
>> >
>> > Was the tracepoint enabled? Or was there some other rcu call that
>> > triggered this. Or would cpu_online(smp_processor_id()) return true at
>> > this point?
>> >
>>
>> Thanks Steve for jumping into this one!
>>
>> Good point.
>> I looked at my kernel-config (which I already sent :-)).
>>
>> Do I need to enable...?
>>
>> # CONFIG_RCU_TRACE is not set
>>
>> ...or even more?
>>
>
> What I meant by the tracepoint being enabled, was not that it was
> configured in (I'm assuming it was), but that you started tracing?
>
>  echo 1 > /sys/kernel/debug/tracing/events/enable
>
> or
>
>  echo 1 > /sys/kernel/debug/tracing/events/tlb/tlb_flushed/enable
>

NO, I did not start any tracing before doing my testing.

# cat /sys/kernel/debug/tracing/events/enable
0

# echo 1 > /sys/kernel/debug/tracing/events/enable

# cat /sys/kernel/debug/tracing/events/enable
X

# LC_ALL=C cat /sys/kernel/debug/tracing/events/tlb/tlb_flushed/enable
cat: /sys/kernel/debug/tracing/events/tlb/tlb_flushed/enable: No such
file or directory

Looks like I need to enable...?

# CONFIG_DEBUG_TLBFLUSH is not set

To answer your question...

# cat /sys/devices/system/cpu/cpu1/online
1

# echo 0 > /sys/devices/system/cpu/cpu1/online

# cat /sys/devices/system/cpu/cpu1/online
0

[  375.337050] intel_pstate CPU 1 exiting
[  375.351069] smpboot: CPU 1 is now offline

So, this did not happen this time.

- Sedat -
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Thu, 5 Feb 2015 21:07:27 +0100
Sedat Dilek  wrote:

> > Is this Paul's version of the patch or mine? If it is just mine, do you
> > know if Paul's version triggers this too?
> >
> 
> This one which entered Pauls rcu-next tree.
> 
> [1] 
> http://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=rcu/next=2b27cf7317d8a99a50bead9faccd54b46b6f0c41

That's mine.

It looks like the condition will be tested before it calls and rcu
code. Which is why I was confused that it still gave a splat. Paul
posted a patch before this that did the check outside the trace point.

This one:

 http://marc.info/?l=linux-kernel=142310961217650=2

> 
> >> ( I did not build from scratch but re-invoking make "updated" the
> >> files touched by Steven's patch, see attached build-log. )
> >>
> >> Unfortunately, the call-trace remains when doing an offlining of cpu1.
> >> ( It's good to see it's reproducible. )
> >
> > Was the tracepoint enabled? Or was there some other rcu call that
> > triggered this. Or would cpu_online(smp_processor_id()) return true at
> > this point?
> >
> 
> Thanks Steve for jumping into this one!
> 
> Good point.
> I looked at my kernel-config (which I already sent :-)).
> 
> Do I need to enable...?
> 
> # CONFIG_RCU_TRACE is not set
> 
> ...or even more?
>

What I meant by the tracepoint being enabled, was not that it was
configured in (I'm assuming it was), but that you started tracing?

 echo 1 > /sys/kernel/debug/tracing/events/enable

or

 echo 1 > /sys/kernel/debug/tracing/events/tlb/tlb_flushed/enable

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Thu, Feb 5, 2015 at 8:58 PM, Steven Rostedt  wrote:
> On Thu, 5 Feb 2015 20:25:21 +0100
> Sedat Dilek  wrote:
>
>> On Thu, Feb 5, 2015 at 7:45 PM, Paul E. McKenney
>>  wrote:
>> > On Thu, Feb 05, 2015 at 10:35:33AM -0800, Dave Hansen wrote:
>> >> On 02/05/2015 10:34 AM, Paul E. McKenney wrote:
>> >> >> >   Did I actually need to be
>> >> >> > onlining/offlining CPUs to hit the splat that Sedat was reporting?
>> >> > Yep, you do need to offline at least one CPU to hit that splat.
>> >>
>> >> Heh, do we need a debugging mode that will randomly offline/online CPUs? 
>> >> :)
>> >
>> > For that, kernel/rcu/rcutorture.c and kernel/locking/locktorture.c
>> > are your friends.  ;-)
>> >
>> > The problem is that I only run RCU-relevant combinations of Kconfigs,
>> > which means that I missed the ones that Sedat used to find this problem.
>> > So I guess it is a good thing that others run -next testing.
>> >
>>
>> [ Revived by a voltaren resinat pill... ]
>>
>> I reverted "x86/mm: Omit switch_mm() tracing for offline CPUs"
>> ...and...
>> applied "tlb: Don't do trace_tlb_flush() on offline CPUs"
>> ...in my build-dir.
>
> Is this Paul's version of the patch or mine? If it is just mine, do you
> know if Paul's version triggers this too?
>

This one which entered Pauls rcu-next tree.

[1] 
http://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=rcu/next=2b27cf7317d8a99a50bead9faccd54b46b6f0c41

>> ( I did not build from scratch but re-invoking make "updated" the
>> files touched by Steven's patch, see attached build-log. )
>>
>> Unfortunately, the call-trace remains when doing an offlining of cpu1.
>> ( It's good to see it's reproducible. )
>
> Was the tracepoint enabled? Or was there some other rcu call that
> triggered this. Or would cpu_online(smp_processor_id()) return true at
> this point?
>

Thanks Steve for jumping into this one!

Good point.
I looked at my kernel-config (which I already sent :-)).

Do I need to enable...?

# CONFIG_RCU_TRACE is not set

...or even more?

- Sedat -

> -- Steve
>
>>
>> root# echo 0 > /sys/devices/system/cpu/cpu1/online
>>
>> [  121.652796] intel_pstate CPU 1 exiting
>> [  121.666272]
>> [  121.666274] ===
>> [  121.666274] [ INFO: suspicious RCU usage. ]
>> [  121.666277] 3.19.0-rc7-next-20150204.7-iniza-small #4 Not tainted
>> [  121.666278] ---
>> [  121.666280] include/trace/events/tlb.h:37 suspicious
>> rcu_dereference_check() usage!
>> [  121.666281]
>> [  121.666281] other info that might help us debug this:
>> [  121.666281]
>> [  121.666282]
>> [  121.666282] RCU used illegally from offline CPU!
>> [  121.666282] rcu_scheduler_active = 1, debug_locks = 0
>> [  121.666283] no locks held by swapper/1/0.
>> [  121.666284]
>> [  121.666284] stack backtrace:
>> [  121.666287] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
>> 3.19.0-rc7-next-20150204.7-iniza-small #4
>> [  121.666288] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
>> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
>> [  121.666293]  0001 88011a44fe18 817e39cd
>> 0011
>> [  121.666296]  88011a448290 88011a44fe48 810d6af7
>> 8800d3dfaac0
>> [  121.666299]  0001 81d32ce0 0005
>> 88011a44fe78
>> [  121.666300] Call Trace:
>> [  121.666308]  [] dump_stack+0x4c/0x65
>> [  121.666313]  [] lockdep_rcu_suspicious+0xe7/0x120
>> [  121.666318]  [] idle_task_exit+0x1c9/0x260
>> [  121.666322]  [] play_dead_common+0xe/0x50
>> [  121.666325]  [] native_play_dead+0x15/0x140
>> [  121.666330]  [] arch_cpu_idle_dead+0xf/0x20
>> [  121.666333]  [] cpu_startup_entry+0x37e/0x580
>> [  121.666336]  [] start_secondary+0x140/0x150
>> [  121.666744] smpboot: CPU 1 is now offline
>>
>> >From rcu point this is now safe?
>> But another area (linux-pm?) is still affected?
>> I will try to test "vanilla" pm-next if the problem exists with
>> intel_pstate as suggested by Rafael.
>> Hmmm, not sure how I can get the pm-next code which went into
>> next-20150204 as linux-pm.git#linux-next was feeded with new stuff.
>>
>>
>> - Sedat -
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Thu, 5 Feb 2015 20:25:21 +0100
Sedat Dilek  wrote:

> On Thu, Feb 5, 2015 at 7:45 PM, Paul E. McKenney
>  wrote:
> > On Thu, Feb 05, 2015 at 10:35:33AM -0800, Dave Hansen wrote:
> >> On 02/05/2015 10:34 AM, Paul E. McKenney wrote:
> >> >> >   Did I actually need to be
> >> >> > onlining/offlining CPUs to hit the splat that Sedat was reporting?
> >> > Yep, you do need to offline at least one CPU to hit that splat.
> >>
> >> Heh, do we need a debugging mode that will randomly offline/online CPUs? :)
> >
> > For that, kernel/rcu/rcutorture.c and kernel/locking/locktorture.c
> > are your friends.  ;-)
> >
> > The problem is that I only run RCU-relevant combinations of Kconfigs,
> > which means that I missed the ones that Sedat used to find this problem.
> > So I guess it is a good thing that others run -next testing.
> >
> 
> [ Revived by a voltaren resinat pill... ]
> 
> I reverted "x86/mm: Omit switch_mm() tracing for offline CPUs"
> ...and...
> applied "tlb: Don't do trace_tlb_flush() on offline CPUs"
> ...in my build-dir.

Is this Paul's version of the patch or mine? If it is just mine, do you
know if Paul's version triggers this too?

> ( I did not build from scratch but re-invoking make "updated" the
> files touched by Steven's patch, see attached build-log. )
> 
> Unfortunately, the call-trace remains when doing an offlining of cpu1.
> ( It's good to see it's reproducible. )

Was the tracepoint enabled? Or was there some other rcu call that
triggered this. Or would cpu_online(smp_processor_id()) return true at
this point?

-- Steve

> 
> root# echo 0 > /sys/devices/system/cpu/cpu1/online
> 
> [  121.652796] intel_pstate CPU 1 exiting
> [  121.666272]
> [  121.666274] ===
> [  121.666274] [ INFO: suspicious RCU usage. ]
> [  121.666277] 3.19.0-rc7-next-20150204.7-iniza-small #4 Not tainted
> [  121.666278] ---
> [  121.666280] include/trace/events/tlb.h:37 suspicious
> rcu_dereference_check() usage!
> [  121.666281]
> [  121.666281] other info that might help us debug this:
> [  121.666281]
> [  121.666282]
> [  121.666282] RCU used illegally from offline CPU!
> [  121.666282] rcu_scheduler_active = 1, debug_locks = 0
> [  121.666283] no locks held by swapper/1/0.
> [  121.666284]
> [  121.666284] stack backtrace:
> [  121.666287] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
> 3.19.0-rc7-next-20150204.7-iniza-small #4
> [  121.666288] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
> [  121.666293]  0001 88011a44fe18 817e39cd
> 0011
> [  121.666296]  88011a448290 88011a44fe48 810d6af7
> 8800d3dfaac0
> [  121.666299]  0001 81d32ce0 0005
> 88011a44fe78
> [  121.666300] Call Trace:
> [  121.666308]  [] dump_stack+0x4c/0x65
> [  121.666313]  [] lockdep_rcu_suspicious+0xe7/0x120
> [  121.666318]  [] idle_task_exit+0x1c9/0x260
> [  121.666322]  [] play_dead_common+0xe/0x50
> [  121.666325]  [] native_play_dead+0x15/0x140
> [  121.666330]  [] arch_cpu_idle_dead+0xf/0x20
> [  121.666333]  [] cpu_startup_entry+0x37e/0x580
> [  121.666336]  [] start_secondary+0x140/0x150
> [  121.666744] smpboot: CPU 1 is now offline
> 
> >From rcu point this is now safe?
> But another area (linux-pm?) is still affected?
> I will try to test "vanilla" pm-next if the problem exists with
> intel_pstate as suggested by Rafael.
> Hmmm, not sure how I can get the pm-next code which went into
> next-20150204 as linux-pm.git#linux-next was feeded with new stuff.
> 
> 
> - Sedat -

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Thu, Feb 5, 2015 at 4:17 AM, Martin K. Petersen
 wrote:
>> "Sedat" == Sedat Dilek  writes:
>
> Sedat> No, but I am here on a so-called WUBI installation which
> Sedat> triggered some bugs being an exotic installation.  My
> Sedat> Ubuntu/precise is a 18GiB image laying on my Win7 partition
> Sedat> (/dev/sda2).
>
> I've been mulling over this for a while and can't come up with a good
> approach. So let's just nuke these warnings.
>
> --
> Martin K. Petersen  Oracle Linux Engineering
>
>
> block: Quiesce zeroout wrapper
>
> blkdev_issue_zeroout() printed a warning if a device failed a discard or
> write same request despite advertising support for these. That's fine
> for SCSI since we'll disable these commands if we get an error back from
> the disk saying that they are not supported. And consequently the
> warning only gets printed once.
>
> There are other types of block devices that support discard, however,
> and these may return -EOPNOTSUPP for each command but leave discard
> enabled in the queue limits. This will cause a warning message for every
> blkdev_issue_zeroout() invocation.
>
> Remove the offending warning messages.
>
> Reported-by: Sedat Dilek 
> Signed-off-by: Martin K. Petersen 
> ---
>  block/blk-lib.c |   26 +++---
>  1 file changed, 7 insertions(+), 19 deletions(-)
>
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 715e948f58a4..7688ee3f5d72 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -286,7 +286,6 @@ static int __blkdev_issue_zeroout(struct block_device 
> *bdev, sector_t sector,
>   * @discard:   whether to discard the block range
>   *
>   * Description:
> -
>   *  Zero-fill a block range.  If the discard flag is set and the block
>   *  device guarantees that subsequent READ operations to the block range
>   *  in question will return zeroes, the blocks will be discarded. Should
> @@ -303,26 +302,15 @@ int blkdev_issue_zeroout(struct block_device *bdev, 
> sector_t sector,
>  sector_t nr_sects, gfp_t gfp_mask, bool discard)
>  {
> struct request_queue *q = bdev_get_queue(bdev);
> -   unsigned char bdn[BDEVNAME_SIZE];
> -
> -   if (discard && blk_queue_discard(q) && q->limits.discard_zeroes_data) 
> {
>
> -   if (!blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 
> 0))
> -   return 0;
> -
> -   bdevname(bdev, bdn);
> -   pr_warn("%s: DISCARD failed. Manually zeroing.\n", bdn);
> -   }
> +   if (discard && blk_queue_discard(q) && q->limits.discard_zeroes_data 
> &&
> +   blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0) == 0)
> +   return 0;
>
> -   if (bdev_write_same(bdev)) {
> -
> -   if (!blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
> -ZERO_PAGE(0)))
> -   return 0;
> -
> -   bdevname(bdev, bdn);
> -   pr_warn("%s: WRITE SAME failed. Manually zeroing.\n", bdn);
> -   }
> +   if (bdev_write_same(bdev) &&
> +   blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
> +   ZERO_PAGE(0)) == 0)
> +   return 0;
>
> return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
>  }

Martin, will you send a separate patch for that?

Thanks.

- Sedat -
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Thu, Feb 5, 2015 at 8:33 PM, Paul E. McKenney
 wrote:
> On Thu, Feb 05, 2015 at 08:25:21PM +0100, Sedat Dilek wrote:
>> On Thu, Feb 5, 2015 at 7:45 PM, Paul E. McKenney
>>  wrote:
>> > On Thu, Feb 05, 2015 at 10:35:33AM -0800, Dave Hansen wrote:
>> >> On 02/05/2015 10:34 AM, Paul E. McKenney wrote:
>> >> >> >   Did I actually need to be
>> >> >> > onlining/offlining CPUs to hit the splat that Sedat was reporting?
>> >> > Yep, you do need to offline at least one CPU to hit that splat.
>> >>
>> >> Heh, do we need a debugging mode that will randomly offline/online CPUs? 
>> >> :)
>> >
>> > For that, kernel/rcu/rcutorture.c and kernel/locking/locktorture.c
>> > are your friends.  ;-)
>> >
>> > The problem is that I only run RCU-relevant combinations of Kconfigs,
>> > which means that I missed the ones that Sedat used to find this problem.
>> > So I guess it is a good thing that others run -next testing.
>> >
>>
>> [ Revived by a voltaren resinat pill... ]
>>
>> I reverted "x86/mm: Omit switch_mm() tracing for offline CPUs"
>> ...and...
>> applied "tlb: Don't do trace_tlb_flush() on offline CPUs"
>> ...in my build-dir.
>> ( I did not build from scratch but re-invoking make "updated" the
>> files touched by Steven's patch, see attached build-log. )
>>
>> Unfortunately, the call-trace remains when doing an offlining of cpu1.
>> ( It's good to see it's reproducible. )
>>
>> root# echo 0 > /sys/devices/system/cpu/cpu1/online
>>
>> [  121.652796] intel_pstate CPU 1 exiting
>> [  121.666272]
>> [  121.666274] ===
>> [  121.666274] [ INFO: suspicious RCU usage. ]
>> [  121.666277] 3.19.0-rc7-next-20150204.7-iniza-small #4 Not tainted
>> [  121.666278] ---
>> [  121.666280] include/trace/events/tlb.h:37 suspicious
>> rcu_dereference_check() usage!
>> [  121.666281]
>> [  121.666281] other info that might help us debug this:
>> [  121.666281]
>> [  121.666282]
>> [  121.666282] RCU used illegally from offline CPU!
>> [  121.666282] rcu_scheduler_active = 1, debug_locks = 0
>> [  121.666283] no locks held by swapper/1/0.
>> [  121.666284]
>> [  121.666284] stack backtrace:
>> [  121.666287] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
>> 3.19.0-rc7-next-20150204.7-iniza-small #4
>> [  121.666288] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
>> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
>> [  121.666293]  0001 88011a44fe18 817e39cd
>> 0011
>> [  121.666296]  88011a448290 88011a44fe48 810d6af7
>> 8800d3dfaac0
>> [  121.666299]  0001 81d32ce0 0005
>> 88011a44fe78
>> [  121.666300] Call Trace:
>> [  121.666308]  [] dump_stack+0x4c/0x65
>> [  121.666313]  [] lockdep_rcu_suspicious+0xe7/0x120
>> [  121.666318]  [] idle_task_exit+0x1c9/0x260
>> [  121.666322]  [] play_dead_common+0xe/0x50
>> [  121.666325]  [] native_play_dead+0x15/0x140
>> [  121.666330]  [] arch_cpu_idle_dead+0xf/0x20
>> [  121.666333]  [] cpu_startup_entry+0x37e/0x580
>> [  121.666336]  [] start_secondary+0x140/0x150
>> [  121.666744] smpboot: CPU 1 is now offline
>>
>> >From rcu point this is now safe?
>> But another area (linux-pm?) is still affected?
>> I will try to test "vanilla" pm-next if the problem exists with
>> intel_pstate as suggested by Rafael.
>> Hmmm, not sure how I can get the pm-next code which went into
>> next-20150204 as linux-pm.git#linux-next was feeded with new stuff.
>
> At this point, I am starting to think in terms of moving the new
> CPU_DYING_IDLE notification later in the offline sequence.  This will
> take me a bit to get set up correctly, but I hope to have a patch
> some time tomorrow (Friday), Pacific time.
>

Is "CPU_DYING_IDLE (notification)" rcu area?

Shall I do a pm-next testing?

By looking at [1] I got the commit-id/sha1 which went into next-20150204.

n102: pm 12f24f2d78ce801c9330c5f682b7beb215bdbab1

If this helps you I will do.

"For Paul" :-)
- Sedat -

[1] 
http://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/tree/Next/SHA1s?id=next-20150204#n102
[2] 
http://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/commit/?h=linux-next=12f24f2d78ce801c9330c5f682b7beb215bdbab1

> Thanx, Paul
>
>> - Sedat -
>
>>   CHK include/config/kernel.release
>> make KBUILD_SRC=
>>   CHK include/config/kernel.release
>>   CHK include/generated/uapi/linux/version.h
>>   CHK include/generated/utsrelease.h
>>   CALLscripts/checksyscalls.sh
>>   CHK include/generated/compile.h
>>   CC  arch/x86/mm/init.o
>>   CC  arch/x86/mm/init_64.o
>>   CC  mm/mmu_context.o
>>   CC  kernel/fork.o
>>   CC  arch/x86/kernel/process_64.o
>>   CC  mm/mmap.o
>>   CC  arch/x86/kernel/ldt.o
>>   CC  arch/x86/mm/tlb.o
>>   CC  arch/x86/kernel/setup.o
>>   LD  arch/x86/mm/built-in.o
>>   CC  kernel/exit.o
>>   LD

Re: linux-next: Tree for Feb 4

On Thu, Feb 05, 2015 at 08:25:21PM +0100, Sedat Dilek wrote:
> On Thu, Feb 5, 2015 at 7:45 PM, Paul E. McKenney
>  wrote:
> > On Thu, Feb 05, 2015 at 10:35:33AM -0800, Dave Hansen wrote:
> >> On 02/05/2015 10:34 AM, Paul E. McKenney wrote:
> >> >> >   Did I actually need to be
> >> >> > onlining/offlining CPUs to hit the splat that Sedat was reporting?
> >> > Yep, you do need to offline at least one CPU to hit that splat.
> >>
> >> Heh, do we need a debugging mode that will randomly offline/online CPUs? :)
> >
> > For that, kernel/rcu/rcutorture.c and kernel/locking/locktorture.c
> > are your friends.  ;-)
> >
> > The problem is that I only run RCU-relevant combinations of Kconfigs,
> > which means that I missed the ones that Sedat used to find this problem.
> > So I guess it is a good thing that others run -next testing.
> >
> 
> [ Revived by a voltaren resinat pill... ]
> 
> I reverted "x86/mm: Omit switch_mm() tracing for offline CPUs"
> ...and...
> applied "tlb: Don't do trace_tlb_flush() on offline CPUs"
> ...in my build-dir.
> ( I did not build from scratch but re-invoking make "updated" the
> files touched by Steven's patch, see attached build-log. )
> 
> Unfortunately, the call-trace remains when doing an offlining of cpu1.
> ( It's good to see it's reproducible. )
> 
> root# echo 0 > /sys/devices/system/cpu/cpu1/online
> 
> [  121.652796] intel_pstate CPU 1 exiting
> [  121.666272]
> [  121.666274] ===
> [  121.666274] [ INFO: suspicious RCU usage. ]
> [  121.666277] 3.19.0-rc7-next-20150204.7-iniza-small #4 Not tainted
> [  121.666278] ---
> [  121.666280] include/trace/events/tlb.h:37 suspicious
> rcu_dereference_check() usage!
> [  121.666281]
> [  121.666281] other info that might help us debug this:
> [  121.666281]
> [  121.666282]
> [  121.666282] RCU used illegally from offline CPU!
> [  121.666282] rcu_scheduler_active = 1, debug_locks = 0
> [  121.666283] no locks held by swapper/1/0.
> [  121.666284]
> [  121.666284] stack backtrace:
> [  121.666287] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
> 3.19.0-rc7-next-20150204.7-iniza-small #4
> [  121.666288] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
> [  121.666293]  0001 88011a44fe18 817e39cd
> 0011
> [  121.666296]  88011a448290 88011a44fe48 810d6af7
> 8800d3dfaac0
> [  121.666299]  0001 81d32ce0 0005
> 88011a44fe78
> [  121.666300] Call Trace:
> [  121.666308]  [] dump_stack+0x4c/0x65
> [  121.666313]  [] lockdep_rcu_suspicious+0xe7/0x120
> [  121.666318]  [] idle_task_exit+0x1c9/0x260
> [  121.666322]  [] play_dead_common+0xe/0x50
> [  121.666325]  [] native_play_dead+0x15/0x140
> [  121.666330]  [] arch_cpu_idle_dead+0xf/0x20
> [  121.666333]  [] cpu_startup_entry+0x37e/0x580
> [  121.666336]  [] start_secondary+0x140/0x150
> [  121.666744] smpboot: CPU 1 is now offline
> 
> >From rcu point this is now safe?
> But another area (linux-pm?) is still affected?
> I will try to test "vanilla" pm-next if the problem exists with
> intel_pstate as suggested by Rafael.
> Hmmm, not sure how I can get the pm-next code which went into
> next-20150204 as linux-pm.git#linux-next was feeded with new stuff.

At this point, I am starting to think in terms of moving the new
CPU_DYING_IDLE notification later in the offline sequence.  This will
take me a bit to get set up correctly, but I hope to have a patch
some time tomorrow (Friday), Pacific time.

Thanx, Paul

> - Sedat -

>   CHK include/config/kernel.release
> make KBUILD_SRC=
>   CHK include/config/kernel.release
>   CHK include/generated/uapi/linux/version.h
>   CHK include/generated/utsrelease.h
>   CALLscripts/checksyscalls.sh
>   CHK include/generated/compile.h
>   CC  arch/x86/mm/init.o
>   CC  arch/x86/mm/init_64.o
>   CC  mm/mmu_context.o
>   CC  kernel/fork.o
>   CC  arch/x86/kernel/process_64.o
>   CC  mm/mmap.o
>   CC  arch/x86/kernel/ldt.o
>   CC  arch/x86/mm/tlb.o
>   CC  arch/x86/kernel/setup.o
>   LD  arch/x86/mm/built-in.o
>   CC  kernel/exit.o
>   LD  mm/built-in.o
>   CC  arch/x86/xen/mmu.o
>   CC  arch/x86/kernel/apic/ipi.o
>   CC  fs/exec.o
>   LD  arch/x86/kernel/apic/built-in.o
>   CC  kernel/power/snapshot.o
>   CC  arch/x86/kernel/cpu/common.o
>   LD  kernel/power/built-in.o
>   LD  arch/x86/xen/built-in.o
>   CC  kernel/sched/core.o
>   LD  arch/x86/kernel/cpu/built-in.o
>   CC  arch/x86/kernel/smp.o
>   CC  arch/x86/kernel/machine_kexec_64.o
>   LD  arch/x86/kernel/built-in.o
>   LD  arch/x86/built-in.o
>   LD  kernel/sched/built-in.o
>   CC  kernel/module.o
>   CC  fs/compat.o
>   CHK kernel/config_data.h
>   LD

Re: linux-next: Tree for Feb 4

On Thu, Feb 05, 2015 at 10:35:33AM -0800, Dave Hansen wrote:
> On 02/05/2015 10:34 AM, Paul E. McKenney wrote:
> >> >   Did I actually need to be
> >> > onlining/offlining CPUs to hit the splat that Sedat was reporting?
> > Yep, you do need to offline at least one CPU to hit that splat.
> 
> Heh, do we need a debugging mode that will randomly offline/online CPUs? :)

For that, kernel/rcu/rcutorture.c and kernel/locking/locktorture.c
are your friends.  ;-)

The problem is that I only run RCU-relevant combinations of Kconfigs,
which means that I missed the ones that Sedat used to find this problem.
So I guess it is a good thing that others run -next testing.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On 02/05/2015 10:34 AM, Paul E. McKenney wrote:
>> >   Did I actually need to be
>> > onlining/offlining CPUs to hit the splat that Sedat was reporting?
> Yep, you do need to offline at least one CPU to hit that splat.

Heh, do we need a debugging mode that will randomly offline/online CPUs? :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Thu, Feb 05, 2015 at 10:11:31AM -0800, Dave Hansen wrote:
> On 02/05/2015 10:08 AM, Steven Rostedt wrote:
> > --- a/include/trace/events/tlb.h
> > +++ b/include/trace/events/tlb.h
> > @@ -13,11 +13,13 @@
> > { TLB_LOCAL_SHOOTDOWN,  "local shootdown" },\
> > { TLB_LOCAL_MM_SHOOTDOWN,   "local mm shootdown" }
> >  
> > -TRACE_EVENT(tlb_flush,
> > +TRACE_EVENT_CONDITION(tlb_flush,
> >  
> > TP_PROTO(int reason, unsigned long pages),
> > TP_ARGS(reason, pages),
> >  
> > +   TP_CONDITION(cpu_online(smp_processor_id())),
> 
> That's a pretty reasonable fix, although it would be nice if the
> debugging was easier to hit.

Looks very good to me!

Unless someone else speaks up, I will carry this patch.

>   Did I actually need to be
> onlining/offlining CPUs to hit the splat that Sedat was reporting?

Yep, you do need to offline at least one CPU to hit that splat.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On 02/05/2015 10:08 AM, Steven Rostedt wrote:
> --- a/include/trace/events/tlb.h
> +++ b/include/trace/events/tlb.h
> @@ -13,11 +13,13 @@
>   { TLB_LOCAL_SHOOTDOWN,  "local shootdown" },\
>   { TLB_LOCAL_MM_SHOOTDOWN,   "local mm shootdown" }
>  
> -TRACE_EVENT(tlb_flush,
> +TRACE_EVENT_CONDITION(tlb_flush,
>  
>   TP_PROTO(int reason, unsigned long pages),
>   TP_ARGS(reason, pages),
>  
> + TP_CONDITION(cpu_online(smp_processor_id())),

That's a pretty reasonable fix, although it would be nice if the
debugging was easier to hit.  Did I actually need to be
onlining/offlining CPUs to hit the splat that Sedat was reporting?



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Thu, 5 Feb 2015 13:03:43 -0500
Steven Rostedt  wrote:

> (not tested)
> 
> Signed-off-by: Steven Rostedt 
> ---
> diff --git a/include/trace/events/tlb.h b/include/trace/events/tlb.h
> index 13391d288107..040c1cdfe6d1 100644
> --- a/include/trace/events/tlb.h
> +++ b/include/trace/events/tlb.h
> @@ -13,11 +13,13 @@
>   { TLB_LOCAL_SHOOTDOWN,  "local shootdown" },\
>   { TLB_LOCAL_MM_SHOOTDOWN,   "local mm shootdown" }
>  
> -TRACE_EVENT(tlb_flush,
> +TRACE_EVENT_CONDITION(tlb_flush,
>  
>   TP_PROTO(int reason, unsigned long pages),
>   TP_ARGS(reason, pages),
>  
> + TP_CONDITION(cpu_online(smp_processor_id()),
> +

I said it wasn't tested. I also forgot to hit save after I realized I
was missing a ')'.

-- Steve

Take two:

diff --git a/include/trace/events/tlb.h b/include/trace/events/tlb.h
index 13391d288107..0e7635765153 100644
--- a/include/trace/events/tlb.h
+++ b/include/trace/events/tlb.h
@@ -13,11 +13,13 @@
{ TLB_LOCAL_SHOOTDOWN,  "local shootdown" },\
{ TLB_LOCAL_MM_SHOOTDOWN,   "local mm shootdown" }
 
-TRACE_EVENT(tlb_flush,
+TRACE_EVENT_CONDITION(tlb_flush,
 
TP_PROTO(int reason, unsigned long pages),
TP_ARGS(reason, pages),
 
+   TP_CONDITION(cpu_online(smp_processor_id())),
+
TP_STRUCT__entry(
__field(  int, reason)
__field(unsigned long,  pages)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Wed, 04 Feb 2015 23:14:55 -0800
Dave Hansen  wrote:

> On 02/04/2015 05:53 PM, Sedat Dilek wrote:
> > The architecture-specific switch_mm() function can be called by offline
> > CPUs, but includes event tracing, which cannot be legally carried out
> > on offline CPUs.  This results in a lockdep-RCU splat.  This commit fixes
> > this splat by omitting the tracing when the CPU is offline.
> ...
> >>> >> > load_cr3(next->pgd);
> >>> >> > -   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 
> >>> >> > TLB_FLUSH_ALL);
> >>> >> > +   if (cpu_online(smp_processor_id()))
> >>> >> > +   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 
> >>> >> > TLB_FLUSH_ALL);
> 
> Is this, perhaps, something that we should be doing in the generic trace
> code so that all of the trace users don't have to worry about it?  Also,
> this patch will add overhead to the code when tracing is off.  It would
> be best if we could manage to make the cpu_online() check only in the
> cases where the tracepoint is on.

Note, we can move the check into the code that enables or disables
trace points. I believe, the rcu part of a tracepoint is only the call
to the callbacks. The jump_label part should be safe outside of rcu.

In that case, instead, have this, which does exactly the same thing
without having any overhead of the branch when tracing is disabled:

(not tested)

Signed-off-by: Steven Rostedt 
---
diff --git a/include/trace/events/tlb.h b/include/trace/events/tlb.h
index 13391d288107..040c1cdfe6d1 100644
--- a/include/trace/events/tlb.h
+++ b/include/trace/events/tlb.h
@@ -13,11 +13,13 @@
{ TLB_LOCAL_SHOOTDOWN,  "local shootdown" },\
{ TLB_LOCAL_MM_SHOOTDOWN,   "local mm shootdown" }
 
-TRACE_EVENT(tlb_flush,
+TRACE_EVENT_CONDITION(tlb_flush,
 
TP_PROTO(int reason, unsigned long pages),
TP_ARGS(reason, pages),
 
+   TP_CONDITION(cpu_online(smp_processor_id()),
+
TP_STRUCT__entry(
__field(  int, reason)
__field(unsigned long,  pages)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Thu, Feb 05, 2015 at 03:57:12PM +0100, Sedat Dilek wrote:
> On Thu, Feb 5, 2015 at 8:14 AM, Dave Hansen  wrote:
> > On 02/04/2015 05:53 PM, Sedat Dilek wrote:
> >> The architecture-specific switch_mm() function can be called by offline
> >> CPUs, but includes event tracing, which cannot be legally carried out
> >> on offline CPUs.  This results in a lockdep-RCU splat.  This commit fixes
> >> this splat by omitting the tracing when the CPU is offline.
> > ...
>  >> > load_cr3(next->pgd);
>  >> > -   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 
>  >> > TLB_FLUSH_ALL);
>  >> > +   if (cpu_online(smp_processor_id()))
>  >> > +   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 
>  >> > TLB_FLUSH_ALL);
> >
> > Is this, perhaps, something that we should be doing in the generic trace
> > code so that all of the trace users don't have to worry about it?  Also,
> > this patch will add overhead to the code when tracing is off.  It would
> > be best if we could manage to make the cpu_online() check only in the
> > cases where the tracepoint is on.
> 
> Hi Dave,
> 
> thanks for your feedback.
> 
> I have just seen that I again see the call-trace.

When you get well, could you please send that call trace?

> Maybe you can discuss with Paul and others or offer a proposal patch.

The other possibility is to have a CONFIG_ARCH_DYING_IDLE or some such
that allows this particular flavor of x86 to invoke the CPU_DYING_IDLE
from after the call to switch_mm().  Dave, does that make sense?

My guess would be that there should be a cpu_dying_idle_generic() invoked
from cpu_idle_loop(), and a cpu_dying_idle_native() invoked at the end
of idle_task_exit().  Or can I get away with just moving the current
rcu_notify_cpu() call from cpu_idle_loop() to the end of idle_task_exit()?

A quick look at the calls to idle_task_exit() makes this look plausible.
There are a number of calls to printk() and to complete() that need help,
but that is a pre-existing issue in any case, as both these code paths
have RCU readers that are having no effect on offline CPUs.

Dave, thoughts?

> I should really do something for my recovery (influenza).
> Instead of laying lazy in my bed I thought to update my Linux kernels
> and graphics driver stack which made me happy.

Get well, being sick is bad for your health!  ;-)

Thanx, Paul

> Regards,
> - Sedat -
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Wed, Feb 04, 2015 at 11:14:55PM -0800, Dave Hansen wrote:
> On 02/04/2015 05:53 PM, Sedat Dilek wrote:
> > The architecture-specific switch_mm() function can be called by offline
> > CPUs, but includes event tracing, which cannot be legally carried out
> > on offline CPUs.  This results in a lockdep-RCU splat.  This commit fixes
> > this splat by omitting the tracing when the CPU is offline.
> ...
> >>> >> > load_cr3(next->pgd);
> >>> >> > -   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 
> >>> >> > TLB_FLUSH_ALL);
> >>> >> > +   if (cpu_online(smp_processor_id()))
> >>> >> > +   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 
> >>> >> > TLB_FLUSH_ALL);
> 
> Is this, perhaps, something that we should be doing in the generic trace
> code so that all of the trace users don't have to worry about it?  Also,
> this patch will add overhead to the code when tracing is off.  It would
> be best if we could manage to make the cpu_online() check only in the
> cases where the tracepoint is on.

I considered doing this in the _rcuidle piece of the trace code, but
unlike the RCU idle exit/entry in the _rcuidle stuff, the work required
to get through the RCU online/offline code is pretty heavyweight.
You end up having 16 CPUs contending for an rcu_node lock, for example.

But maybe you are instead suggesting pushing only the cpu_online() check
into the trace infrastructure.  If so, fair point, and I will take a
look at this.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Thu, Feb 5, 2015 at 8:14 AM, Dave Hansen  wrote:
> On 02/04/2015 05:53 PM, Sedat Dilek wrote:
>> The architecture-specific switch_mm() function can be called by offline
>> CPUs, but includes event tracing, which cannot be legally carried out
>> on offline CPUs.  This results in a lockdep-RCU splat.  This commit fixes
>> this splat by omitting the tracing when the CPU is offline.
> ...
 >> > load_cr3(next->pgd);
 >> > -   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 
 >> > TLB_FLUSH_ALL);
 >> > +   if (cpu_online(smp_processor_id()))
 >> > +   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 
 >> > TLB_FLUSH_ALL);
>
> Is this, perhaps, something that we should be doing in the generic trace
> code so that all of the trace users don't have to worry about it?  Also,
> this patch will add overhead to the code when tracing is off.  It would
> be best if we could manage to make the cpu_online() check only in the
> cases where the tracepoint is on.

Hi Dave,

thanks for your feedback.

I have just seen that I again see the call-trace.

Maybe you can discuss with Paul and others or offer a proposal patch.

I should really do something for my recovery (influenza).
Instead of laying lazy in my bed I thought to update my Linux kernels
and graphics driver stack which made me happy.


Regards,
- Sedat -
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Thu, Feb 5, 2015 at 8:14 AM, Dave Hansen d...@sr71.net wrote:
 On 02/04/2015 05:53 PM, Sedat Dilek wrote:
 The architecture-specific switch_mm() function can be called by offline
 CPUs, but includes event tracing, which cannot be legally carried out
 on offline CPUs.  This results in a lockdep-RCU splat.  This commit fixes
 this splat by omitting the tracing when the CPU is offline.
 ...
   load_cr3(next-pgd);
   -   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 
   TLB_FLUSH_ALL);
   +   if (cpu_online(smp_processor_id()))
   +   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 
   TLB_FLUSH_ALL);

 Is this, perhaps, something that we should be doing in the generic trace
 code so that all of the trace users don't have to worry about it?  Also,
 this patch will add overhead to the code when tracing is off.  It would
 be best if we could manage to make the cpu_online() check only in the
 cases where the tracepoint is on.

Hi Dave,

thanks for your feedback.

I have just seen that I again see the call-trace.

Maybe you can discuss with Paul and others or offer a proposal patch.

I should really do something for my recovery (influenza).
Instead of laying lazy in my bed I thought to update my Linux kernels
and graphics driver stack which made me happy.


Regards,
- Sedat -
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Wed, Feb 04, 2015 at 11:14:55PM -0800, Dave Hansen wrote:
 On 02/04/2015 05:53 PM, Sedat Dilek wrote:
  The architecture-specific switch_mm() function can be called by offline
  CPUs, but includes event tracing, which cannot be legally carried out
  on offline CPUs.  This results in a lockdep-RCU splat.  This commit fixes
  this splat by omitting the tracing when the CPU is offline.
 ...
load_cr3(next-pgd);
-   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 
TLB_FLUSH_ALL);
+   if (cpu_online(smp_processor_id()))
+   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 
TLB_FLUSH_ALL);
 
 Is this, perhaps, something that we should be doing in the generic trace
 code so that all of the trace users don't have to worry about it?  Also,
 this patch will add overhead to the code when tracing is off.  It would
 be best if we could manage to make the cpu_online() check only in the
 cases where the tracepoint is on.

I considered doing this in the _rcuidle piece of the trace code, but
unlike the RCU idle exit/entry in the _rcuidle stuff, the work required
to get through the RCU online/offline code is pretty heavyweight.
You end up having 16 CPUs contending for an rcu_node lock, for example.

But maybe you are instead suggesting pushing only the cpu_online() check
into the trace infrastructure.  If so, fair point, and I will take a
look at this.

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Thu, Feb 5, 2015 at 9:22 PM, Steven Rostedt rost...@goodmis.org wrote:
 On Thu, 5 Feb 2015 21:07:27 +0100
 Sedat Dilek sedat.di...@gmail.com wrote:

  Is this Paul's version of the patch or mine? If it is just mine, do you
  know if Paul's version triggers this too?
 

 This one which entered Pauls rcu-next tree.

 [1] 
 http://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=rcu/nextid=2b27cf7317d8a99a50bead9faccd54b46b6f0c41

 That's mine.

 It looks like the condition will be tested before it calls and rcu
 code. Which is why I was confused that it still gave a splat. Paul
 posted a patch before this that did the check outside the trace point.

 This one:

  http://marc.info/?l=linux-kernelm=142310961217650w=2


  ( I did not build from scratch but re-invoking make updated the
  files touched by Steven's patch, see attached build-log. )
 
  Unfortunately, the call-trace remains when doing an offlining of cpu1.
  ( It's good to see it's reproducible. )
 
  Was the tracepoint enabled? Or was there some other rcu call that
  triggered this. Or would cpu_online(smp_processor_id()) return true at
  this point?
 

 Thanks Steve for jumping into this one!

 Good point.
 I looked at my kernel-config (which I already sent :-)).

 Do I need to enable...?

 # CONFIG_RCU_TRACE is not set

 ...or even more?


 What I meant by the tracepoint being enabled, was not that it was
 configured in (I'm assuming it was), but that you started tracing?

  echo 1  /sys/kernel/debug/tracing/events/enable

 or

  echo 1  /sys/kernel/debug/tracing/events/tlb/tlb_flushed/enable


NO, I did not start any tracing before doing my testing.

# cat /sys/kernel/debug/tracing/events/enable
0

# echo 1  /sys/kernel/debug/tracing/events/enable

# cat /sys/kernel/debug/tracing/events/enable
X

# LC_ALL=C cat /sys/kernel/debug/tracing/events/tlb/tlb_flushed/enable
cat: /sys/kernel/debug/tracing/events/tlb/tlb_flushed/enable: No such
file or directory

Looks like I need to enable...?

# CONFIG_DEBUG_TLBFLUSH is not set

To answer your question...

# cat /sys/devices/system/cpu/cpu1/online
1

# echo 0  /sys/devices/system/cpu/cpu1/online

# cat /sys/devices/system/cpu/cpu1/online
0

[  375.337050] intel_pstate CPU 1 exiting
[  375.351069] smpboot: CPU 1 is now offline

So, this did not happen this time.

- Sedat -
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

[...]
  Unfortunately, the call-trace remains when doing an offlining of cpu1.
  ( It's good to see it's reproducible. )
 
  Was the tracepoint enabled? Or was there some other rcu call that
  triggered this. Or would cpu_online(smp_processor_id()) return true at
  this point?
 

 Thanks Steve for jumping into this one!

 Good point.
 I looked at my kernel-config (which I already sent :-)).

 Do I need to enable...?

 # CONFIG_RCU_TRACE is not set

 ...or even more?


 What I meant by the tracepoint being enabled, was not that it was
 configured in (I'm assuming it was), but that you started tracing?

  echo 1  /sys/kernel/debug/tracing/events/enable

 or

  echo 1  /sys/kernel/debug/tracing/events/tlb/tlb_flushed/enable


 NO, I did not start any tracing before doing my testing.

 # cat /sys/kernel/debug/tracing/events/enable
 0

 # echo 1  /sys/kernel/debug/tracing/events/enable

 # cat /sys/kernel/debug/tracing/events/enable
 X

 # LC_ALL=C cat /sys/kernel/debug/tracing/events/tlb/tlb_flushed/enable
 cat: /sys/kernel/debug/tracing/events/tlb/tlb_flushed/enable: No such
 file or directory

 Looks like I need to enable...?

 # CONFIG_DEBUG_TLBFLUSH is not set


Here my new kernel-config (not sure if I really need them to be enabled):

$ ./scripts/diffconfig
/boot/config-3.19.0-rc7-next-20150204.7-iniza-small
/boot/config-3.19.0-rc7-next-20150204.9-iniza-small
 DEBUG_TLBFLUSH n - y
 RCU_TRACE n - y
 TREE_RCU_TRACE n - y

Steve, this was a typo it's called tlb_flush not tlb_flush*ed*:

# cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/enable
1

[  391.090381] intel_pstate CPU 1 exiting
[  391.104491] smpboot: CPU 1 is now offline

- Sedat -
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Thu, Feb 5, 2015 at 11:09 PM, Steven Rostedt rost...@goodmis.org wrote:
 On Thu, 5 Feb 2015 22:45:59 +0100
 Sedat Dilek sedat.di...@gmail.com wrote:

 Steve, this was a typo it's called tlb_flush not tlb_flush*ed*:

 Heh, yeah, I typed that entire line in by hand. Just be lucky that was
 the only typo ;-)


 # cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/enable
 1

 [  391.090381] intel_pstate CPU 1 exiting
 [  391.104491] smpboot: CPU 1 is now offline


 Now, if you disable that (echo 0 to that file), do you still get the
 rcu lockdep splat if you suspend and resume?


YES, I get the call-trace again!

# cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/enable
0

# cat /sys/devices/system/cpu/cpu1/online
0

[ 2470.606222] intel_pstate CPU 1 exiting
[ 2470.628153]
[ 2470.628155] ===
[ 2470.628156] [ INFO: suspicious RCU usage. ]
[ 2470.628159] 3.19.0-rc7-next-20150204.9-iniza-small #1 Not tainted
[ 2470.628160] ---
[ 2470.628162] include/trace/events/tlb.h:37 suspicious
rcu_dereference_check() usage!
[ 2470.628163]
[ 2470.628163] other info that might help us debug this:
[ 2470.628163]
[ 2470.628164]
[ 2470.628164] RCU used illegally from offline CPU!
[ 2470.628164] rcu_scheduler_active = 1, debug_locks = 0
[ 2470.628165] no locks held by swapper/1/0.
[ 2470.628166]
[ 2470.628166] stack backtrace:
[ 2470.628169] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
3.19.0-rc7-next-20150204.9-iniza-small #1
[ 2470.628171] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
[ 2470.628176]  0001 88011a44fe18 817ecf4d
0011
[ 2470.628179]  88011a448290 88011a44fe48 810d6b57
8800cc2660c0
[ 2470.628182]  0001 81d35160 0002
88011a44fe78
[ 2470.628183] Call Trace:
[ 2470.628192]  [817ecf4d] dump_stack+0x4c/0x65
[ 2470.628198]  [810d6b57] lockdep_rcu_suspicious+0xe7/0x120
[ 2470.628203]  [810b7459] idle_task_exit+0x1c9/0x260
[ 2470.628208]  [81054c4e] play_dead_common+0xe/0x50
[ 2470.628211]  [81054ca5] native_play_dead+0x15/0x140
[ 2470.628216]  [8102963f] arch_cpu_idle_dead+0xf/0x20
[ 2470.628219]  [810cdbae] cpu_startup_entry+0x37e/0x580
[ 2470.628222] smpboot: CPU 1 didn't die...
[ 2470.628224]  [81053e20] start_secondary+0x140/0x150

- Sedat -
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Thu, 5 Feb 2015 22:45:59 +0100
Sedat Dilek sedat.di...@gmail.com wrote:

 Steve, this was a typo it's called tlb_flush not tlb_flush*ed*:

Heh, yeah, I typed that entire line in by hand. Just be lucky that was
the only typo ;-)

 
 # cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/enable
 1
 
 [  391.090381] intel_pstate CPU 1 exiting
 [  391.104491] smpboot: CPU 1 is now offline
 

Now, if you disable that (echo 0 to that file), do you still get the
rcu lockdep splat if you suspend and resume?

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Thu, 5 Feb 2015 23:16:21 +0100
Sedat Dilek sedat.di...@gmail.com wrote:

 On Thu, Feb 5, 2015 at 11:09 PM, Steven Rostedt rost...@goodmis.org wrote:
  On Thu, 5 Feb 2015 22:45:59 +0100
  Sedat Dilek sedat.di...@gmail.com wrote:
 
  Steve, this was a typo it's called tlb_flush not tlb_flush*ed*:
 
  Heh, yeah, I typed that entire line in by hand. Just be lucky that was
  the only typo ;-)
 
 
  # cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/enable
  1
 
  [  391.090381] intel_pstate CPU 1 exiting
  [  391.104491] smpboot: CPU 1 is now offline
 
 
  Now, if you disable that (echo 0 to that file), do you still get the
  rcu lockdep splat if you suspend and resume?
 
 
 YES, I get the call-trace again!
 

Bah! I see where the warning comes from. In include/linux/tracepoint.h
we have:

#define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \
extern struct tracepoint __tracepoint_##name;   \
static inline void trace_##name(proto)  \
{   \
if (static_key_false(__tracepoint_##name.key)) \
__DO_TRACE(__tracepoint_##name,\
TP_PROTO(data_proto),   \
TP_ARGS(data_args), \
TP_CONDITION(cond),,);  \
if (IS_ENABLED(CONFIG_LOCKDEP)) {   \
rcu_read_lock_sched_notrace();  \
rcu_dereference_sched(__tracepoint_##name.funcs);\
rcu_read_unlock_sched_notrace();\
}   \
}   \

See that if (IS_ENABLED(CONFIG_LOCKDEP))?

I'm recalling this. Because tracepoints require RCU, and RCU lockdep
doesn't trigger if a tracepoint isn't enabled (because the rcu calls
are hidden in the __DO_TRACE() behind that static_key_false), we would
be missing lots of rcu problem tracepoints because tests were run
without them enabled.

The answer was to add this rcu check when LOCKDEP was enabled. So no,
adding that conditional isn't going to help, because lockdep will
trigger here, even if it were safe because of the conditional :-/.

That said, let's add this (on top of the old patch):

(again, not tested)

Signed-off-by: Steven Rostedt rost...@goodmis.org
---
diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index 4b75d591eb5e..401b5bfbcdbd 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -47,7 +47,12 @@ static inline void switch_mm(struct mm_struct *prev, struct 
mm_struct *next,
 
/* Re-load page tables */
load_cr3(next-pgd);
-   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
+   /*
+* Do not check rcu when tracing is not enabled. The
+* tracepoint has a condition to not trace if the CPU is
+* offline, and rcu check will complain if it is.
+*/
+   trace_tlb_flush_rcu_nocheck(TLB_FLUSH_ON_TASK_SWITCH, 
TLB_FLUSH_ALL);
 
/* Stop flush ipis for the previous mm */
cpumask_clear_cpu(cpu, mm_cpumask(prev));
@@ -84,7 +89,13 @@ static inline void switch_mm(struct mm_struct *prev, struct 
mm_struct *next,
 * to make sure to use no freed page tables.
 */
load_cr3(next-pgd);
-   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 
TLB_FLUSH_ALL);
+   /*
+* Do not check rcu when tracing is not enabled. The
+* tracepoint has a condition to not trace if the CPU is
+* offline, and rcu check will complain if it is.
+*/
+   trace_tlb_flush_rcu_nocheck(TLB_FLUSH_ON_TASK_SWITCH,
+   TLB_FLUSH_ALL);
load_LDT_nolock(next-context);
}
}
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index e08e21e5f601..747a05aceb60 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -179,6 +179,14 @@ extern void syscall_unregfunc(void);
rcu_read_unlock_sched_notrace();\
}   \
}   \
+   static inline void trace_##name##_rcu_nocheck(proto)\
+   {   \
+   if

Re: linux-next: Tree for Feb 4

[...]
 That said, let's add this (on top of the old patch):


 Which old patch?
 tlb: Don't do trace_tlb_flush() on offline CPUs ?


Or did you mean x86/mm: Omit switch_mm() tracing for offline CPUs

- Sedat -
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Fri, Feb 6, 2015 at 12:11 AM, Steven Rostedt rost...@goodmis.org wrote:
 On Thu, 5 Feb 2015 23:16:21 +0100
 Sedat Dilek sedat.di...@gmail.com wrote:

 On Thu, Feb 5, 2015 at 11:09 PM, Steven Rostedt rost...@goodmis.org wrote:
  On Thu, 5 Feb 2015 22:45:59 +0100
  Sedat Dilek sedat.di...@gmail.com wrote:
 
  Steve, this was a typo it's called tlb_flush not tlb_flush*ed*:
 
  Heh, yeah, I typed that entire line in by hand. Just be lucky that was
  the only typo ;-)
 
 
  # cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/enable
  1
 
  [  391.090381] intel_pstate CPU 1 exiting
  [  391.104491] smpboot: CPU 1 is now offline
 
 
  Now, if you disable that (echo 0 to that file), do you still get the
  rcu lockdep splat if you suspend and resume?
 

 YES, I get the call-trace again!


 Bah! I see where the warning comes from. In include/linux/tracepoint.h
 we have:

 #define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \
 extern struct tracepoint __tracepoint_##name;   \
 static inline void trace_##name(proto)  \
 {   \
 if (static_key_false(__tracepoint_##name.key)) \
 __DO_TRACE(__tracepoint_##name,\
 TP_PROTO(data_proto),   \
 TP_ARGS(data_args), \
 TP_CONDITION(cond),,);  \
 if (IS_ENABLED(CONFIG_LOCKDEP)) {   \
 rcu_read_lock_sched_notrace();  \
 rcu_dereference_sched(__tracepoint_##name.funcs);\
 rcu_read_unlock_sched_notrace();\
 }   \
 }   \

 See that if (IS_ENABLED(CONFIG_LOCKDEP))?


I have here...

CONFIG_LOCKDEP=y

- Sedat -

 I'm recalling this. Because tracepoints require RCU, and RCU lockdep
 doesn't trigger if a tracepoint isn't enabled (because the rcu calls
 are hidden in the __DO_TRACE() behind that static_key_false), we would
 be missing lots of rcu problem tracepoints because tests were run
 without them enabled.

 The answer was to add this rcu check when LOCKDEP was enabled. So no,
 adding that conditional isn't going to help, because lockdep will
 trigger here, even if it were safe because of the conditional :-/.

 That said, let's add this (on top of the old patch):


Which old patch?
tlb: Don't do trace_tlb_flush() on offline CPUs ?

- Sedat -

 (again, not tested)

 Signed-off-by: Steven Rostedt rost...@goodmis.org
 ---
 diff --git a/arch/x86/include/asm/mmu_context.h 
 b/arch/x86/include/asm/mmu_context.h
 index 4b75d591eb5e..401b5bfbcdbd 100644
 --- a/arch/x86/include/asm/mmu_context.h
 +++ b/arch/x86/include/asm/mmu_context.h
 @@ -47,7 +47,12 @@ static inline void switch_mm(struct mm_struct *prev, 
 struct mm_struct *next,

 /* Re-load page tables */
 load_cr3(next-pgd);
 -   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
 +   /*
 +* Do not check rcu when tracing is not enabled. The
 +* tracepoint has a condition to not trace if the CPU is
 +* offline, and rcu check will complain if it is.
 +*/
 +   trace_tlb_flush_rcu_nocheck(TLB_FLUSH_ON_TASK_SWITCH, 
 TLB_FLUSH_ALL);

 /* Stop flush ipis for the previous mm */
 cpumask_clear_cpu(cpu, mm_cpumask(prev));
 @@ -84,7 +89,13 @@ static inline void switch_mm(struct mm_struct *prev, 
 struct mm_struct *next,
  * to make sure to use no freed page tables.
  */
 load_cr3(next-pgd);
 -   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 
 TLB_FLUSH_ALL);
 +   /*
 +* Do not check rcu when tracing is not enabled. The
 +* tracepoint has a condition to not trace if the CPU 
 is
 +* offline, and rcu check will complain if it is.
 +*/
 +   trace_tlb_flush_rcu_nocheck(TLB_FLUSH_ON_TASK_SWITCH,
 +   TLB_FLUSH_ALL);
 load_LDT_nolock(next-context);
 }
 }
 diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
 index e08e21e5f601..747a05aceb60 100644
 --- a/include/linux/tracepoint.h
 +++ b/include/linux/tracepoint.h
 @@ -179,6 +179,14 @@ extern void syscall_unregfunc(void);
 rcu_read_unlock_sched_notrace();\
 }   \

Re: linux-next: Tree for Feb 4

On Fri, Feb 6, 2015 at 1:12 AM, Steven Rostedt rost...@goodmis.org wrote:
 On Fri, 6 Feb 2015 00:53:41 +0100
 Sedat Dilek sedat.di...@gmail.com wrote:

  See that if (IS_ENABLED(CONFIG_LOCKDEP))?
 

 I have here...

 CONFIG_LOCKDEP=y

 Yep, I knew that (you wouldn't get splats without it).


 Which old patch?
 tlb: Don't do trace_tlb_flush() on offline CPUs ?

 Yeah, that one. In other words, just add this patch on the kernel you
 just tested.

 Thanks,


Do you have a name with label for your patch?

- Sedat -
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Fri, 6 Feb 2015 00:53:41 +0100
Sedat Dilek sedat.di...@gmail.com wrote:

  See that if (IS_ENABLED(CONFIG_LOCKDEP))?
 
 
 I have here...
 
 CONFIG_LOCKDEP=y

Yep, I knew that (you wouldn't get splats without it).


 Which old patch?
 tlb: Don't do trace_tlb_flush() on offline CPUs ?

Yeah, that one. In other words, just add this patch on the kernel you
just tested.

Thanks,

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Thu, Feb 05, 2015 at 03:57:12PM +0100, Sedat Dilek wrote:
 On Thu, Feb 5, 2015 at 8:14 AM, Dave Hansen d...@sr71.net wrote:
  On 02/04/2015 05:53 PM, Sedat Dilek wrote:
  The architecture-specific switch_mm() function can be called by offline
  CPUs, but includes event tracing, which cannot be legally carried out
  on offline CPUs.  This results in a lockdep-RCU splat.  This commit fixes
  this splat by omitting the tracing when the CPU is offline.
  ...
load_cr3(next-pgd);
-   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 
TLB_FLUSH_ALL);
+   if (cpu_online(smp_processor_id()))
+   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 
TLB_FLUSH_ALL);
 
  Is this, perhaps, something that we should be doing in the generic trace
  code so that all of the trace users don't have to worry about it?  Also,
  this patch will add overhead to the code when tracing is off.  It would
  be best if we could manage to make the cpu_online() check only in the
  cases where the tracepoint is on.
 
 Hi Dave,
 
 thanks for your feedback.
 
 I have just seen that I again see the call-trace.

When you get well, could you please send that call trace?

 Maybe you can discuss with Paul and others or offer a proposal patch.

The other possibility is to have a CONFIG_ARCH_DYING_IDLE or some such
that allows this particular flavor of x86 to invoke the CPU_DYING_IDLE
from after the call to switch_mm().  Dave, does that make sense?

My guess would be that there should be a cpu_dying_idle_generic() invoked
from cpu_idle_loop(), and a cpu_dying_idle_native() invoked at the end
of idle_task_exit().  Or can I get away with just moving the current
rcu_notify_cpu() call from cpu_idle_loop() to the end of idle_task_exit()?

A quick look at the calls to idle_task_exit() makes this look plausible.
There are a number of calls to printk() and to complete() that need help,
but that is a pre-existing issue in any case, as both these code paths
have RCU readers that are having no effect on offline CPUs.

Dave, thoughts?

 I should really do something for my recovery (influenza).
 Instead of laying lazy in my bed I thought to update my Linux kernels
 and graphics driver stack which made me happy.

Get well, being sick is bad for your health!  ;-)

Thanx, Paul

 Regards,
 - Sedat -
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Wed, 04 Feb 2015 23:14:55 -0800
Dave Hansen d...@sr71.net wrote:

 On 02/04/2015 05:53 PM, Sedat Dilek wrote:
  The architecture-specific switch_mm() function can be called by offline
  CPUs, but includes event tracing, which cannot be legally carried out
  on offline CPUs.  This results in a lockdep-RCU splat.  This commit fixes
  this splat by omitting the tracing when the CPU is offline.
 ...
load_cr3(next-pgd);
-   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 
TLB_FLUSH_ALL);
+   if (cpu_online(smp_processor_id()))
+   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 
TLB_FLUSH_ALL);
 
 Is this, perhaps, something that we should be doing in the generic trace
 code so that all of the trace users don't have to worry about it?  Also,
 this patch will add overhead to the code when tracing is off.  It would
 be best if we could manage to make the cpu_online() check only in the
 cases where the tracepoint is on.

Note, we can move the check into the code that enables or disables
trace points. I believe, the rcu part of a tracepoint is only the call
to the callbacks. The jump_label part should be safe outside of rcu.

In that case, instead, have this, which does exactly the same thing
without having any overhead of the branch when tracing is disabled:

(not tested)

Signed-off-by: Steven Rostedt rost...@goodmis.org
---
diff --git a/include/trace/events/tlb.h b/include/trace/events/tlb.h
index 13391d288107..040c1cdfe6d1 100644
--- a/include/trace/events/tlb.h
+++ b/include/trace/events/tlb.h
@@ -13,11 +13,13 @@
{ TLB_LOCAL_SHOOTDOWN,  local shootdown },\
{ TLB_LOCAL_MM_SHOOTDOWN,   local mm shootdown }
 
-TRACE_EVENT(tlb_flush,
+TRACE_EVENT_CONDITION(tlb_flush,
 
TP_PROTO(int reason, unsigned long pages),
TP_ARGS(reason, pages),
 
+   TP_CONDITION(cpu_online(smp_processor_id()),
+
TP_STRUCT__entry(
__field(  int, reason)
__field(unsigned long,  pages)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Thu, 5 Feb 2015 13:03:43 -0500
Steven Rostedt rost...@goodmis.org wrote:

 (not tested)
 
 Signed-off-by: Steven Rostedt rost...@goodmis.org
 ---
 diff --git a/include/trace/events/tlb.h b/include/trace/events/tlb.h
 index 13391d288107..040c1cdfe6d1 100644
 --- a/include/trace/events/tlb.h
 +++ b/include/trace/events/tlb.h
 @@ -13,11 +13,13 @@
   { TLB_LOCAL_SHOOTDOWN,  local shootdown },\
   { TLB_LOCAL_MM_SHOOTDOWN,   local mm shootdown }
  
 -TRACE_EVENT(tlb_flush,
 +TRACE_EVENT_CONDITION(tlb_flush,
  
   TP_PROTO(int reason, unsigned long pages),
   TP_ARGS(reason, pages),
  
 + TP_CONDITION(cpu_online(smp_processor_id()),
 +

I said it wasn't tested. I also forgot to hit save after I realized I
was missing a ')'.

-- Steve

Take two:

diff --git a/include/trace/events/tlb.h b/include/trace/events/tlb.h
index 13391d288107..0e7635765153 100644
--- a/include/trace/events/tlb.h
+++ b/include/trace/events/tlb.h
@@ -13,11 +13,13 @@
{ TLB_LOCAL_SHOOTDOWN,  local shootdown },\
{ TLB_LOCAL_MM_SHOOTDOWN,   local mm shootdown }
 
-TRACE_EVENT(tlb_flush,
+TRACE_EVENT_CONDITION(tlb_flush,
 
TP_PROTO(int reason, unsigned long pages),
TP_ARGS(reason, pages),
 
+   TP_CONDITION(cpu_online(smp_processor_id())),
+
TP_STRUCT__entry(
__field(  int, reason)
__field(unsigned long,  pages)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On 02/05/2015 10:08 AM, Steven Rostedt wrote:
 --- a/include/trace/events/tlb.h
 +++ b/include/trace/events/tlb.h
 @@ -13,11 +13,13 @@
   { TLB_LOCAL_SHOOTDOWN,  local shootdown },\
   { TLB_LOCAL_MM_SHOOTDOWN,   local mm shootdown }
  
 -TRACE_EVENT(tlb_flush,
 +TRACE_EVENT_CONDITION(tlb_flush,
  
   TP_PROTO(int reason, unsigned long pages),
   TP_ARGS(reason, pages),
  
 + TP_CONDITION(cpu_online(smp_processor_id())),

That's a pretty reasonable fix, although it would be nice if the
debugging was easier to hit.  Did I actually need to be
onlining/offlining CPUs to hit the splat that Sedat was reporting?



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On 02/05/2015 10:34 AM, Paul E. McKenney wrote:
Did I actually need to be
  onlining/offlining CPUs to hit the splat that Sedat was reporting?
 Yep, you do need to offline at least one CPU to hit that splat.

Heh, do we need a debugging mode that will randomly offline/online CPUs? :)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Thu, Feb 05, 2015 at 10:35:33AM -0800, Dave Hansen wrote:
 On 02/05/2015 10:34 AM, Paul E. McKenney wrote:
 Did I actually need to be
   onlining/offlining CPUs to hit the splat that Sedat was reporting?
  Yep, you do need to offline at least one CPU to hit that splat.
 
 Heh, do we need a debugging mode that will randomly offline/online CPUs? :)

For that, kernel/rcu/rcutorture.c and kernel/locking/locktorture.c
are your friends.  ;-)

The problem is that I only run RCU-relevant combinations of Kconfigs,
which means that I missed the ones that Sedat used to find this problem.
So I guess it is a good thing that others run -next testing.

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Thu, Feb 05, 2015 at 10:11:31AM -0800, Dave Hansen wrote:
 On 02/05/2015 10:08 AM, Steven Rostedt wrote:
  --- a/include/trace/events/tlb.h
  +++ b/include/trace/events/tlb.h
  @@ -13,11 +13,13 @@
  { TLB_LOCAL_SHOOTDOWN,  local shootdown },\
  { TLB_LOCAL_MM_SHOOTDOWN,   local mm shootdown }
   
  -TRACE_EVENT(tlb_flush,
  +TRACE_EVENT_CONDITION(tlb_flush,
   
  TP_PROTO(int reason, unsigned long pages),
  TP_ARGS(reason, pages),
   
  +   TP_CONDITION(cpu_online(smp_processor_id())),
 
 That's a pretty reasonable fix, although it would be nice if the
 debugging was easier to hit.

Looks very good to me!

Unless someone else speaks up, I will carry this patch.

   Did I actually need to be
 onlining/offlining CPUs to hit the splat that Sedat was reporting?

Yep, you do need to offline at least one CPU to hit that splat.

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Thu, Feb 05, 2015 at 08:25:21PM +0100, Sedat Dilek wrote:
 On Thu, Feb 5, 2015 at 7:45 PM, Paul E. McKenney
 paul...@linux.vnet.ibm.com wrote:
  On Thu, Feb 05, 2015 at 10:35:33AM -0800, Dave Hansen wrote:
  On 02/05/2015 10:34 AM, Paul E. McKenney wrote:
  Did I actually need to be
onlining/offlining CPUs to hit the splat that Sedat was reporting?
   Yep, you do need to offline at least one CPU to hit that splat.
 
  Heh, do we need a debugging mode that will randomly offline/online CPUs? :)
 
  For that, kernel/rcu/rcutorture.c and kernel/locking/locktorture.c
  are your friends.  ;-)
 
  The problem is that I only run RCU-relevant combinations of Kconfigs,
  which means that I missed the ones that Sedat used to find this problem.
  So I guess it is a good thing that others run -next testing.
 
 
 [ Revived by a voltaren resinat pill... ]
 
 I reverted x86/mm: Omit switch_mm() tracing for offline CPUs
 ...and...
 applied tlb: Don't do trace_tlb_flush() on offline CPUs
 ...in my build-dir.
 ( I did not build from scratch but re-invoking make updated the
 files touched by Steven's patch, see attached build-log. )
 
 Unfortunately, the call-trace remains when doing an offlining of cpu1.
 ( It's good to see it's reproducible. )
 
 root# echo 0  /sys/devices/system/cpu/cpu1/online
 
 [  121.652796] intel_pstate CPU 1 exiting
 [  121.666272]
 [  121.666274] ===
 [  121.666274] [ INFO: suspicious RCU usage. ]
 [  121.666277] 3.19.0-rc7-next-20150204.7-iniza-small #4 Not tainted
 [  121.666278] ---
 [  121.666280] include/trace/events/tlb.h:37 suspicious
 rcu_dereference_check() usage!
 [  121.666281]
 [  121.666281] other info that might help us debug this:
 [  121.666281]
 [  121.666282]
 [  121.666282] RCU used illegally from offline CPU!
 [  121.666282] rcu_scheduler_active = 1, debug_locks = 0
 [  121.666283] no locks held by swapper/1/0.
 [  121.666284]
 [  121.666284] stack backtrace:
 [  121.666287] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
 3.19.0-rc7-next-20150204.7-iniza-small #4
 [  121.666288] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
 [  121.666293]  0001 88011a44fe18 817e39cd
 0011
 [  121.666296]  88011a448290 88011a44fe48 810d6af7
 8800d3dfaac0
 [  121.666299]  0001 81d32ce0 0005
 88011a44fe78
 [  121.666300] Call Trace:
 [  121.666308]  [817e39cd] dump_stack+0x4c/0x65
 [  121.666313]  [810d6af7] lockdep_rcu_suspicious+0xe7/0x120
 [  121.666318]  [810b73f9] idle_task_exit+0x1c9/0x260
 [  121.666322]  [81054c4e] play_dead_common+0xe/0x50
 [  121.666325]  [81054ca5] native_play_dead+0x15/0x140
 [  121.666330]  [8102963f] arch_cpu_idle_dead+0xf/0x20
 [  121.666333]  [810cdb4e] cpu_startup_entry+0x37e/0x580
 [  121.666336]  [81053e20] start_secondary+0x140/0x150
 [  121.666744] smpboot: CPU 1 is now offline
 
 From rcu point this is now safe?
 But another area (linux-pm?) is still affected?
 I will try to test vanilla pm-next if the problem exists with
 intel_pstate as suggested by Rafael.
 Hmmm, not sure how I can get the pm-next code which went into
 next-20150204 as linux-pm.git#linux-next was feeded with new stuff.

At this point, I am starting to think in terms of moving the new
CPU_DYING_IDLE notification later in the offline sequence.  This will
take me a bit to get set up correctly, but I hope to have a patch
some time tomorrow (Friday), Pacific time.

Thanx, Paul

 - Sedat -

   CHK include/config/kernel.release
 make KBUILD_SRC=
   CHK include/config/kernel.release
   CHK include/generated/uapi/linux/version.h
   CHK include/generated/utsrelease.h
   CALLscripts/checksyscalls.sh
   CHK include/generated/compile.h
   CC  arch/x86/mm/init.o
   CC  arch/x86/mm/init_64.o
   CC  mm/mmu_context.o
   CC  kernel/fork.o
   CC  arch/x86/kernel/process_64.o
   CC  mm/mmap.o
   CC  arch/x86/kernel/ldt.o
   CC  arch/x86/mm/tlb.o
   CC  arch/x86/kernel/setup.o
   LD  arch/x86/mm/built-in.o
   CC  kernel/exit.o
   LD  mm/built-in.o
   CC  arch/x86/xen/mmu.o
   CC  arch/x86/kernel/apic/ipi.o
   CC  fs/exec.o
   LD  arch/x86/kernel/apic/built-in.o
   CC  kernel/power/snapshot.o
   CC  arch/x86/kernel/cpu/common.o
   LD  kernel/power/built-in.o
   LD  arch/x86/xen/built-in.o
   CC  kernel/sched/core.o
   LD  arch/x86/kernel/cpu/built-in.o
   CC  arch/x86/kernel/smp.o
   CC  arch/x86/kernel/machine_kexec_64.o
   LD  arch/x86/kernel/built-in.o
   LD  arch/x86/built-in.o
   LD  kernel/sched/built-in.o
   CC  kernel/module.o
   CC  fs/compat.o
   CHK kernel/config_data.h
   LD

Re: linux-next: Tree for Feb 4

On Thu, Feb 5, 2015 at 4:17 AM, Martin K. Petersen
martin.peter...@oracle.com wrote:
 Sedat == Sedat Dilek sedat.di...@gmail.com writes:

 Sedat No, but I am here on a so-called WUBI installation which
 Sedat triggered some bugs being an exotic installation.  My
 Sedat Ubuntu/precise is a 18GiB image laying on my Win7 partition
 Sedat (/dev/sda2).

 I've been mulling over this for a while and can't come up with a good
 approach. So let's just nuke these warnings.

 --
 Martin K. Petersen  Oracle Linux Engineering


 block: Quiesce zeroout wrapper

 blkdev_issue_zeroout() printed a warning if a device failed a discard or
 write same request despite advertising support for these. That's fine
 for SCSI since we'll disable these commands if we get an error back from
 the disk saying that they are not supported. And consequently the
 warning only gets printed once.

 There are other types of block devices that support discard, however,
 and these may return -EOPNOTSUPP for each command but leave discard
 enabled in the queue limits. This will cause a warning message for every
 blkdev_issue_zeroout() invocation.

 Remove the offending warning messages.

 Reported-by: Sedat Dilek sedat.di...@gmail.com
 Signed-off-by: Martin K. Petersen martin.peter...@oracle.com
 ---
  block/blk-lib.c |   26 +++---
  1 file changed, 7 insertions(+), 19 deletions(-)

 diff --git a/block/blk-lib.c b/block/blk-lib.c
 index 715e948f58a4..7688ee3f5d72 100644
 --- a/block/blk-lib.c
 +++ b/block/blk-lib.c
 @@ -286,7 +286,6 @@ static int __blkdev_issue_zeroout(struct block_device 
 *bdev, sector_t sector,
   * @discard:   whether to discard the block range
   *
   * Description:
 -
   *  Zero-fill a block range.  If the discard flag is set and the block
   *  device guarantees that subsequent READ operations to the block range
   *  in question will return zeroes, the blocks will be discarded. Should
 @@ -303,26 +302,15 @@ int blkdev_issue_zeroout(struct block_device *bdev, 
 sector_t sector,
  sector_t nr_sects, gfp_t gfp_mask, bool discard)
  {
 struct request_queue *q = bdev_get_queue(bdev);
 -   unsigned char bdn[BDEVNAME_SIZE];
 -
 -   if (discard  blk_queue_discard(q)  q-limits.discard_zeroes_data) 
 {

 -   if (!blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 
 0))
 -   return 0;
 -
 -   bdevname(bdev, bdn);
 -   pr_warn(%s: DISCARD failed. Manually zeroing.\n, bdn);
 -   }
 +   if (discard  blk_queue_discard(q)  q-limits.discard_zeroes_data 
 
 +   blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0) == 0)
 +   return 0;

 -   if (bdev_write_same(bdev)) {
 -
 -   if (!blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
 -ZERO_PAGE(0)))
 -   return 0;
 -
 -   bdevname(bdev, bdn);
 -   pr_warn(%s: WRITE SAME failed. Manually zeroing.\n, bdn);
 -   }
 +   if (bdev_write_same(bdev) 
 +   blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
 +   ZERO_PAGE(0)) == 0)
 +   return 0;

 return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
  }

Martin, will you send a separate patch for that?

Thanks.

- Sedat -
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Thu, 5 Feb 2015 20:25:21 +0100
Sedat Dilek sedat.di...@gmail.com wrote:

 On Thu, Feb 5, 2015 at 7:45 PM, Paul E. McKenney
 paul...@linux.vnet.ibm.com wrote:
  On Thu, Feb 05, 2015 at 10:35:33AM -0800, Dave Hansen wrote:
  On 02/05/2015 10:34 AM, Paul E. McKenney wrote:
  Did I actually need to be
onlining/offlining CPUs to hit the splat that Sedat was reporting?
   Yep, you do need to offline at least one CPU to hit that splat.
 
  Heh, do we need a debugging mode that will randomly offline/online CPUs? :)
 
  For that, kernel/rcu/rcutorture.c and kernel/locking/locktorture.c
  are your friends.  ;-)
 
  The problem is that I only run RCU-relevant combinations of Kconfigs,
  which means that I missed the ones that Sedat used to find this problem.
  So I guess it is a good thing that others run -next testing.
 
 
 [ Revived by a voltaren resinat pill... ]
 
 I reverted x86/mm: Omit switch_mm() tracing for offline CPUs
 ...and...
 applied tlb: Don't do trace_tlb_flush() on offline CPUs
 ...in my build-dir.

Is this Paul's version of the patch or mine? If it is just mine, do you
know if Paul's version triggers this too?

 ( I did not build from scratch but re-invoking make updated the
 files touched by Steven's patch, see attached build-log. )
 
 Unfortunately, the call-trace remains when doing an offlining of cpu1.
 ( It's good to see it's reproducible. )

Was the tracepoint enabled? Or was there some other rcu call that
triggered this. Or would cpu_online(smp_processor_id()) return true at
this point?

-- Steve

 
 root# echo 0  /sys/devices/system/cpu/cpu1/online
 
 [  121.652796] intel_pstate CPU 1 exiting
 [  121.666272]
 [  121.666274] ===
 [  121.666274] [ INFO: suspicious RCU usage. ]
 [  121.666277] 3.19.0-rc7-next-20150204.7-iniza-small #4 Not tainted
 [  121.666278] ---
 [  121.666280] include/trace/events/tlb.h:37 suspicious
 rcu_dereference_check() usage!
 [  121.666281]
 [  121.666281] other info that might help us debug this:
 [  121.666281]
 [  121.666282]
 [  121.666282] RCU used illegally from offline CPU!
 [  121.666282] rcu_scheduler_active = 1, debug_locks = 0
 [  121.666283] no locks held by swapper/1/0.
 [  121.666284]
 [  121.666284] stack backtrace:
 [  121.666287] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
 3.19.0-rc7-next-20150204.7-iniza-small #4
 [  121.666288] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
 [  121.666293]  0001 88011a44fe18 817e39cd
 0011
 [  121.666296]  88011a448290 88011a44fe48 810d6af7
 8800d3dfaac0
 [  121.666299]  0001 81d32ce0 0005
 88011a44fe78
 [  121.666300] Call Trace:
 [  121.666308]  [817e39cd] dump_stack+0x4c/0x65
 [  121.666313]  [810d6af7] lockdep_rcu_suspicious+0xe7/0x120
 [  121.666318]  [810b73f9] idle_task_exit+0x1c9/0x260
 [  121.666322]  [81054c4e] play_dead_common+0xe/0x50
 [  121.666325]  [81054ca5] native_play_dead+0x15/0x140
 [  121.666330]  [8102963f] arch_cpu_idle_dead+0xf/0x20
 [  121.666333]  [810cdb4e] cpu_startup_entry+0x37e/0x580
 [  121.666336]  [81053e20] start_secondary+0x140/0x150
 [  121.666744] smpboot: CPU 1 is now offline
 
 From rcu point this is now safe?
 But another area (linux-pm?) is still affected?
 I will try to test vanilla pm-next if the problem exists with
 intel_pstate as suggested by Rafael.
 Hmmm, not sure how I can get the pm-next code which went into
 next-20150204 as linux-pm.git#linux-next was feeded with new stuff.
 
 
 - Sedat -

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Thu, Feb 5, 2015 at 8:58 PM, Steven Rostedt rost...@goodmis.org wrote:
 On Thu, 5 Feb 2015 20:25:21 +0100
 Sedat Dilek sedat.di...@gmail.com wrote:

 On Thu, Feb 5, 2015 at 7:45 PM, Paul E. McKenney
 paul...@linux.vnet.ibm.com wrote:
  On Thu, Feb 05, 2015 at 10:35:33AM -0800, Dave Hansen wrote:
  On 02/05/2015 10:34 AM, Paul E. McKenney wrote:
  Did I actually need to be
onlining/offlining CPUs to hit the splat that Sedat was reporting?
   Yep, you do need to offline at least one CPU to hit that splat.
 
  Heh, do we need a debugging mode that will randomly offline/online CPUs? 
  :)
 
  For that, kernel/rcu/rcutorture.c and kernel/locking/locktorture.c
  are your friends.  ;-)
 
  The problem is that I only run RCU-relevant combinations of Kconfigs,
  which means that I missed the ones that Sedat used to find this problem.
  So I guess it is a good thing that others run -next testing.
 

 [ Revived by a voltaren resinat pill... ]

 I reverted x86/mm: Omit switch_mm() tracing for offline CPUs
 ...and...
 applied tlb: Don't do trace_tlb_flush() on offline CPUs
 ...in my build-dir.

 Is this Paul's version of the patch or mine? If it is just mine, do you
 know if Paul's version triggers this too?


This one which entered Pauls rcu-next tree.

[1] 
http://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=rcu/nextid=2b27cf7317d8a99a50bead9faccd54b46b6f0c41

 ( I did not build from scratch but re-invoking make updated the
 files touched by Steven's patch, see attached build-log. )

 Unfortunately, the call-trace remains when doing an offlining of cpu1.
 ( It's good to see it's reproducible. )

 Was the tracepoint enabled? Or was there some other rcu call that
 triggered this. Or would cpu_online(smp_processor_id()) return true at
 this point?


Thanks Steve for jumping into this one!

Good point.
I looked at my kernel-config (which I already sent :-)).

Do I need to enable...?

# CONFIG_RCU_TRACE is not set

...or even more?

- Sedat -

 -- Steve


 root# echo 0  /sys/devices/system/cpu/cpu1/online

 [  121.652796] intel_pstate CPU 1 exiting
 [  121.666272]
 [  121.666274] ===
 [  121.666274] [ INFO: suspicious RCU usage. ]
 [  121.666277] 3.19.0-rc7-next-20150204.7-iniza-small #4 Not tainted
 [  121.666278] ---
 [  121.666280] include/trace/events/tlb.h:37 suspicious
 rcu_dereference_check() usage!
 [  121.666281]
 [  121.666281] other info that might help us debug this:
 [  121.666281]
 [  121.666282]
 [  121.666282] RCU used illegally from offline CPU!
 [  121.666282] rcu_scheduler_active = 1, debug_locks = 0
 [  121.666283] no locks held by swapper/1/0.
 [  121.666284]
 [  121.666284] stack backtrace:
 [  121.666287] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
 3.19.0-rc7-next-20150204.7-iniza-small #4
 [  121.666288] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
 [  121.666293]  0001 88011a44fe18 817e39cd
 0011
 [  121.666296]  88011a448290 88011a44fe48 810d6af7
 8800d3dfaac0
 [  121.666299]  0001 81d32ce0 0005
 88011a44fe78
 [  121.666300] Call Trace:
 [  121.666308]  [817e39cd] dump_stack+0x4c/0x65
 [  121.666313]  [810d6af7] lockdep_rcu_suspicious+0xe7/0x120
 [  121.666318]  [810b73f9] idle_task_exit+0x1c9/0x260
 [  121.666322]  [81054c4e] play_dead_common+0xe/0x50
 [  121.666325]  [81054ca5] native_play_dead+0x15/0x140
 [  121.666330]  [8102963f] arch_cpu_idle_dead+0xf/0x20
 [  121.666333]  [810cdb4e] cpu_startup_entry+0x37e/0x580
 [  121.666336]  [81053e20] start_secondary+0x140/0x150
 [  121.666744] smpboot: CPU 1 is now offline

 From rcu point this is now safe?
 But another area (linux-pm?) is still affected?
 I will try to test vanilla pm-next if the problem exists with
 intel_pstate as suggested by Rafael.
 Hmmm, not sure how I can get the pm-next code which went into
 next-20150204 as linux-pm.git#linux-next was feeded with new stuff.


 - Sedat -

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Thu, Feb 5, 2015 at 8:33 PM, Paul E. McKenney
paul...@linux.vnet.ibm.com wrote:
 On Thu, Feb 05, 2015 at 08:25:21PM +0100, Sedat Dilek wrote:
 On Thu, Feb 5, 2015 at 7:45 PM, Paul E. McKenney
 paul...@linux.vnet.ibm.com wrote:
  On Thu, Feb 05, 2015 at 10:35:33AM -0800, Dave Hansen wrote:
  On 02/05/2015 10:34 AM, Paul E. McKenney wrote:
  Did I actually need to be
onlining/offlining CPUs to hit the splat that Sedat was reporting?
   Yep, you do need to offline at least one CPU to hit that splat.
 
  Heh, do we need a debugging mode that will randomly offline/online CPUs? 
  :)
 
  For that, kernel/rcu/rcutorture.c and kernel/locking/locktorture.c
  are your friends.  ;-)
 
  The problem is that I only run RCU-relevant combinations of Kconfigs,
  which means that I missed the ones that Sedat used to find this problem.
  So I guess it is a good thing that others run -next testing.
 

 [ Revived by a voltaren resinat pill... ]

 I reverted x86/mm: Omit switch_mm() tracing for offline CPUs
 ...and...
 applied tlb: Don't do trace_tlb_flush() on offline CPUs
 ...in my build-dir.
 ( I did not build from scratch but re-invoking make updated the
 files touched by Steven's patch, see attached build-log. )

 Unfortunately, the call-trace remains when doing an offlining of cpu1.
 ( It's good to see it's reproducible. )

 root# echo 0  /sys/devices/system/cpu/cpu1/online

 [  121.652796] intel_pstate CPU 1 exiting
 [  121.666272]
 [  121.666274] ===
 [  121.666274] [ INFO: suspicious RCU usage. ]
 [  121.666277] 3.19.0-rc7-next-20150204.7-iniza-small #4 Not tainted
 [  121.666278] ---
 [  121.666280] include/trace/events/tlb.h:37 suspicious
 rcu_dereference_check() usage!
 [  121.666281]
 [  121.666281] other info that might help us debug this:
 [  121.666281]
 [  121.666282]
 [  121.666282] RCU used illegally from offline CPU!
 [  121.666282] rcu_scheduler_active = 1, debug_locks = 0
 [  121.666283] no locks held by swapper/1/0.
 [  121.666284]
 [  121.666284] stack backtrace:
 [  121.666287] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
 3.19.0-rc7-next-20150204.7-iniza-small #4
 [  121.666288] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
 [  121.666293]  0001 88011a44fe18 817e39cd
 0011
 [  121.666296]  88011a448290 88011a44fe48 810d6af7
 8800d3dfaac0
 [  121.666299]  0001 81d32ce0 0005
 88011a44fe78
 [  121.666300] Call Trace:
 [  121.666308]  [817e39cd] dump_stack+0x4c/0x65
 [  121.666313]  [810d6af7] lockdep_rcu_suspicious+0xe7/0x120
 [  121.666318]  [810b73f9] idle_task_exit+0x1c9/0x260
 [  121.666322]  [81054c4e] play_dead_common+0xe/0x50
 [  121.666325]  [81054ca5] native_play_dead+0x15/0x140
 [  121.666330]  [8102963f] arch_cpu_idle_dead+0xf/0x20
 [  121.666333]  [810cdb4e] cpu_startup_entry+0x37e/0x580
 [  121.666336]  [81053e20] start_secondary+0x140/0x150
 [  121.666744] smpboot: CPU 1 is now offline

 From rcu point this is now safe?
 But another area (linux-pm?) is still affected?
 I will try to test vanilla pm-next if the problem exists with
 intel_pstate as suggested by Rafael.
 Hmmm, not sure how I can get the pm-next code which went into
 next-20150204 as linux-pm.git#linux-next was feeded with new stuff.

 At this point, I am starting to think in terms of moving the new
 CPU_DYING_IDLE notification later in the offline sequence.  This will
 take me a bit to get set up correctly, but I hope to have a patch
 some time tomorrow (Friday), Pacific time.


Is CPU_DYING_IDLE (notification) rcu area?

Shall I do a pm-next testing?

By looking at [1] I got the commit-id/sha1 which went into next-20150204.

n102: pm 12f24f2d78ce801c9330c5f682b7beb215bdbab1

If this helps you I will do.

For Paul :-)
- Sedat -

[1] 
http://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/tree/Next/SHA1s?id=next-20150204#n102
[2] 
http://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/commit/?h=linux-nextid=12f24f2d78ce801c9330c5f682b7beb215bdbab1

 Thanx, Paul

 - Sedat -

   CHK include/config/kernel.release
 make KBUILD_SRC=
   CHK include/config/kernel.release
   CHK include/generated/uapi/linux/version.h
   CHK include/generated/utsrelease.h
   CALLscripts/checksyscalls.sh
   CHK include/generated/compile.h
   CC  arch/x86/mm/init.o
   CC  arch/x86/mm/init_64.o
   CC  mm/mmu_context.o
   CC  kernel/fork.o
   CC  arch/x86/kernel/process_64.o
   CC  mm/mmap.o
   CC  arch/x86/kernel/ldt.o
   CC  arch/x86/mm/tlb.o
   CC  arch/x86/kernel/setup.o
   LD  arch/x86/mm/built-in.o
   CC  kernel/exit.o
   LD  mm/built-in.o
   CC  arch/x86/xen/mmu.o
   CC

Re: linux-next: Tree for Feb 4

On Thu, 5 Feb 2015 21:07:27 +0100
Sedat Dilek sedat.di...@gmail.com wrote:

Is this Paul's version of the patch or mine? If it is just mine, do you
know if Paul's version triggers this too?

This one which entered Pauls rcu-next tree.

[1]
http://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=rcu/nextid=2b27cf7317d8a99a50bead9faccd54b46b6f0c41

That's mine.

It looks like the condition will be tested before it calls and rcu
code. Which is why I was confused that it still gave a splat. Paul
posted a patch before this that did the check outside the trace point.

This one:

http://marc.info/?l=linux-kernelm=142310961217650w=2

( I did not build from scratch but re-invoking make updated the
files touched by Steven's patch, see attached build-log. )

Unfortunately, the call-trace remains when doing an offlining of cpu1.
( It's good to see it's reproducible. )

Was the tracepoint enabled? Or was there some other rcu call that
triggered this. Or would cpu_online(smp_processor_id()) return true at
this point?

Thanks Steve for jumping into this one!

Good point.
I looked at my kernel-config (which I already sent :-)).

Do I need to enable...?

# CONFIG_RCU_TRACE is not set

...or even more?

What I meant by the tracepoint being enabled, was not that it was
configured in (I'm assuming it was), but that you started tracing?

echo 1 /sys/kernel/debug/tracing/events/enable

echo 1 /sys/kernel/debug/tracing/events/tlb/tlb_flushed/enable

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

2015-02-04 Thread Dave Hansen

On 02/04/2015 05:53 PM, Sedat Dilek wrote:
> The architecture-specific switch_mm() function can be called by offline
> CPUs, but includes event tracing, which cannot be legally carried out
> on offline CPUs.  This results in a lockdep-RCU splat.  This commit fixes
> this splat by omitting the tracing when the CPU is offline.
...
>>> >> > load_cr3(next->pgd);
>>> >> > -   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 
>>> >> > TLB_FLUSH_ALL);
>>> >> > +   if (cpu_online(smp_processor_id()))
>>> >> > +   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 
>>> >> > TLB_FLUSH_ALL);

Is this, perhaps, something that we should be doing in the generic trace
code so that all of the trace users don't have to worry about it?  Also,
this patch will add overhead to the code when tracing is off.  It would
be best if we could manage to make the cpu_online() check only in the
cases where the tracepoint is on.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Thu, Feb 05, 2015 at 03:12:20AM +0100, Sedat Dilek wrote:
> On Thu, Feb 5, 2015 at 2:53 AM, Sedat Dilek  wrote:
> > On Thu, Feb 5, 2015 at 2:51 AM, Paul E. McKenney
> >  wrote:
> >> On Thu, Feb 05, 2015 at 02:18:01AM +0100, Sedat Dilek wrote:
> >>> On Thu, Feb 5, 2015 at 1:57 AM, Paul E. McKenney
> >>>  wrote:
> >>> > On Thu, Feb 05, 2015 at 01:30:45AM +0100, Sedat Dilek wrote:
> >>> >> On Thu, Feb 5, 2015 at 1:10 AM, Paul E. McKenney
> >>> >>  wrote:
> >>> >> > On Wed, Feb 04, 2015 at 03:51:15PM -0800, Paul E. McKenney wrote:
> >>> >> >> On Wed, Feb 04, 2015 at 11:59:31PM +0100, Rafael J. Wysocki wrote:
> >>> >> >> > On Wednesday, February 04, 2015 01:53:58 PM Paul E. McKenney 
> >>> >> >> > wrote:
> >>> >> >> > > On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki 
> >>> >> >> > > wrote:
> >>> >> >> > > > On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote:
> >>> >> >
> >>> >> > [ . . . ]
> >>> >> >
> >>> >> >> > > > > [ 1144.482666] Disabling non-boot CPUs ...
> >>> >> >> > > > > [ 1144.483000] intel_pstate CPU 1 exiting
> >>> >> >> > > > > [ 1144.486064]
> >>> >> >> > > > > [ 1144.486065] ===
> >>> >> >> > > > > [ 1144.486067] smpboot: CPU 1 didn't die...
> >>> >> >> > > > > [ 1144.486067] [ INFO: suspicious RCU usage. ]
> >>> >> >> > > > > [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 
> >>> >> >> > > > > Not tainted
> >>> >> >> > > > > [ 1144.486070] ---
> >>> >> >> > > > > [ 1144.486072] include/trace/events/tlb.h:35 suspicious
> >>> >> >> > > > > rcu_dereference_check() usage!
> >>> >> >> > > > > [ 1144.486073]
> >>> >> >> > > > > [ 1144.486073] other info that might help us debug this:
> >>> >> >> > > > > [ 1144.486073]
> >>> >> >> > > > > [ 1144.486074]
> >>> >> >> > > > > [ 1144.486074] RCU used illegally from offline CPU!
> >>> >> >> > > > > [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0
> >>> >> >> > > > > [ 1144.486076] no locks held by swapper/1/0.
> >>> >> >> > > > > [ 1144.486076]
> >>> >> >> > > > > [ 1144.486076] stack backtrace:
> >>> >> >> > > > > [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
> >>> >> >> > > > > 3.19.0-rc7-next-20150204.1-iniza-small #1
> >>> >> >> > > > > [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
> >>> >> >> > > > > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 
> >>> >> >> > > > > 03/28/2013
> >>> >> >> > > > > [ 1144.486085]  0001 88011a44fe18 
> >>> >> >> > > > > 817e370d
> >>> >> >> > > > > 0011
> >>> >> >> > > > > [ 1144.486088]  88011a448290 88011a44fe48 
> >>> >> >> > > > > 810d6847
> >>> >> >> > > > > 8800c66b9600
> >>> >> >> > > > > [ 1144.486091]  0001 88011a44c000 
> >>> >> >> > > > > 81cb3900
> >>> >> >> > > > > 88011a44fe78
> >>> >> >> > > > > [ 1144.486092] Call Trace:
> >>> >> >> > > > > [ 1144.486099]  [] dump_stack+0x4c/0x65
> >>> >> >> > > > > [ 1144.486104]  [] 
> >>> >> >> > > > > lockdep_rcu_suspicious+0xe7/0x120
> >>> >> >> > >
> >>> >> >> > > As near as I can tell, idle_task_exit() is running on an 
> >>> >> >> > > offline CPU,
> >>> >> >> > > then calling switch_mm() which contains trace_tlb_flush(), 
> >>> >> >> > > which uses RCU.
> >>> >> >> > > And RCU is objecting to being used from a CPU that it is 
> >>> >> >> > > ignoring.
> >>> >> >> > >
> >>> >> >> > > One approach would be to push RCU's idea of when the CPU goes 
> >>> >> >> > > offline
> >>> >> >> > > down into arch code in this case, using some Kconfig symbol and
> >>> >> >> > > the usual conditional compilation.  Another approach would be to
> >>> >> >> > > invoke the trace calls under cpu_online(), for example, for the
> >>> >> >> > > first such call in switch_mm():
> >>> >> >> > >
> >>> >> >> > >   if (cpu_online(smp_processor_id()))
> >>> >> >> > >   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 
> >>> >> >> > > TLB_FLUSH_ALL);
> >>> >> >> > >
> >>> >> >> > > The compiler would discard this if tracing was disabled.
> >>> >> >> >
> >>> >> >> > That looks like less intrusive to me.
> >>> >> >>
> >>> >> >> One possible concern is increased context-switch path length, but 
> >>> >> >> that
> >>> >> >> would only be the case where tracing is enabled by default.
> >>> >> >
> >>> >> > Nevertheless, here is an untested patch.  Does it help?
> >>> >>
> >>> >> No bedtime :-)
> >>> >
> >>> > Sorry!  Actually, getting results tomorrow would be plenty OK by me.
> >>> >
> >>> >> I tried with a revert of...
> >>> >>
> >>> >> commit 5f1dedac9adb6259bb7b62a923bd7c247a2f2d5b
> >>> >> rcu: Handle outgoing CPUs on exit from idle loop
> >>> >>
> >>> >> ...and offlining cpu1 seems not to produce the trace...
> >>> >
> >>> > As expected.  The trace can still appear, but the outgoing CPU needs to
> >>> > be delayed by at least one jiffy on its final pass through the idle 
> >>> > loop.
> >>> > Which can really happen in virtualized environments.
> >>> >
> >>> >> [  115.280244]

Re: linux-next: Tree for Feb 4

On Thu, Feb 5, 2015 at 4:17 AM, Martin K. Petersen
 wrote:
>> "Sedat" == Sedat Dilek  writes:
>
> Sedat> No, but I am here on a so-called WUBI installation which
> Sedat> triggered some bugs being an exotic installation.  My
> Sedat> Ubuntu/precise is a 18GiB image laying on my Win7 partition
> Sedat> (/dev/sda2).
>
> I've been mulling over this for a while and can't come up with a good
> approach. So let's just nuke these warnings.
>
> --
> Martin K. Petersen  Oracle Linux Engineering
>
>
> block: Quiesce zeroout wrapper
>
> blkdev_issue_zeroout() printed a warning if a device failed a discard or
> write same request despite advertising support for these. That's fine
> for SCSI since we'll disable these commands if we get an error back from
> the disk saying that they are not supported. And consequently the
> warning only gets printed once.
>
> There are other types of block devices that support discard, however,
> and these may return -EOPNOTSUPP for each command but leave discard
> enabled in the queue limits. This will cause a warning message for every
> blkdev_issue_zeroout() invocation.
>
> Remove the offending warning messages.
>
> Reported-by: Sedat Dilek 

Thanks for the fix!

Tested-by: Sedat Dilek 

- Sedat -

> Signed-off-by: Martin K. Petersen 
> ---
>  block/blk-lib.c |   26 +++---
>  1 file changed, 7 insertions(+), 19 deletions(-)
>
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 715e948f58a4..7688ee3f5d72 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -286,7 +286,6 @@ static int __blkdev_issue_zeroout(struct block_device 
> *bdev, sector_t sector,
>   * @discard:   whether to discard the block range
>   *
>   * Description:
> -
>   *  Zero-fill a block range.  If the discard flag is set and the block
>   *  device guarantees that subsequent READ operations to the block range
>   *  in question will return zeroes, the blocks will be discarded. Should
> @@ -303,26 +302,15 @@ int blkdev_issue_zeroout(struct block_device *bdev, 
> sector_t sector,
>  sector_t nr_sects, gfp_t gfp_mask, bool discard)
>  {
> struct request_queue *q = bdev_get_queue(bdev);
> -   unsigned char bdn[BDEVNAME_SIZE];
> -
> -   if (discard && blk_queue_discard(q) && q->limits.discard_zeroes_data) 
> {
>
> -   if (!blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 
> 0))
> -   return 0;
> -
> -   bdevname(bdev, bdn);
> -   pr_warn("%s: DISCARD failed. Manually zeroing.\n", bdn);
> -   }
> +   if (discard && blk_queue_discard(q) && q->limits.discard_zeroes_data 
> &&
> +   blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0) == 0)
> +   return 0;
>
> -   if (bdev_write_same(bdev)) {
> -
> -   if (!blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
> -ZERO_PAGE(0)))
> -   return 0;
> -
> -   bdevname(bdev, bdn);
> -   pr_warn("%s: WRITE SAME failed. Manually zeroing.\n", bdn);
> -   }
> +   if (bdev_write_same(bdev) &&
> +   blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
> +   ZERO_PAGE(0)) == 0)
> +   return 0;
>
> return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
>  }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

2015-02-04 Thread Martin K. Petersen

> "Sedat" == Sedat Dilek  writes:

Sedat> No, but I am here on a so-called WUBI installation which
Sedat> triggered some bugs being an exotic installation.  My
Sedat> Ubuntu/precise is a 18GiB image laying on my Win7 partition
Sedat> (/dev/sda2).

I've been mulling over this for a while and can't come up with a good
approach. So let's just nuke these warnings.

-- 
Martin K. Petersen  Oracle Linux Engineering


block: Quiesce zeroout wrapper

blkdev_issue_zeroout() printed a warning if a device failed a discard or
write same request despite advertising support for these. That's fine
for SCSI since we'll disable these commands if we get an error back from
the disk saying that they are not supported. And consequently the
warning only gets printed once.

There are other types of block devices that support discard, however,
and these may return -EOPNOTSUPP for each command but leave discard
enabled in the queue limits. This will cause a warning message for every
blkdev_issue_zeroout() invocation.

Remove the offending warning messages.

Reported-by: Sedat Dilek 
Signed-off-by: Martin K. Petersen 
---
 block/blk-lib.c |   26 +++---
 1 file changed, 7 insertions(+), 19 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 715e948f58a4..7688ee3f5d72 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -286,7 +286,6 @@ static int __blkdev_issue_zeroout(struct block_device 
*bdev, sector_t sector,
  * @discard:   whether to discard the block range
  *
  * Description:
-
  *  Zero-fill a block range.  If the discard flag is set and the block
  *  device guarantees that subsequent READ operations to the block range
  *  in question will return zeroes, the blocks will be discarded. Should
@@ -303,26 +302,15 @@ int blkdev_issue_zeroout(struct block_device *bdev, 
sector_t sector,
 sector_t nr_sects, gfp_t gfp_mask, bool discard)
 {
struct request_queue *q = bdev_get_queue(bdev);
-   unsigned char bdn[BDEVNAME_SIZE];
-
-   if (discard && blk_queue_discard(q) && q->limits.discard_zeroes_data) {
 
-   if (!blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0))
-   return 0;
-
-   bdevname(bdev, bdn);
-   pr_warn("%s: DISCARD failed. Manually zeroing.\n", bdn);
-   }
+   if (discard && blk_queue_discard(q) && q->limits.discard_zeroes_data &&
+   blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0) == 0)
+   return 0;
 
-   if (bdev_write_same(bdev)) {
-
-   if (!blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
-ZERO_PAGE(0)))
-   return 0;
-
-   bdevname(bdev, bdn);
-   pr_warn("%s: WRITE SAME failed. Manually zeroing.\n", bdn);
-   }
+   if (bdev_write_same(bdev) &&
+   blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
+   ZERO_PAGE(0)) == 0)
+   return 0;
 
return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Thu, Feb 5, 2015 at 2:53 AM, Sedat Dilek  wrote:
> On Thu, Feb 5, 2015 at 2:51 AM, Paul E. McKenney
>  wrote:
>> On Thu, Feb 05, 2015 at 02:18:01AM +0100, Sedat Dilek wrote:
>>> On Thu, Feb 5, 2015 at 1:57 AM, Paul E. McKenney
>>>  wrote:
>>> > On Thu, Feb 05, 2015 at 01:30:45AM +0100, Sedat Dilek wrote:
>>> >> On Thu, Feb 5, 2015 at 1:10 AM, Paul E. McKenney
>>> >>  wrote:
>>> >> > On Wed, Feb 04, 2015 at 03:51:15PM -0800, Paul E. McKenney wrote:
>>> >> >> On Wed, Feb 04, 2015 at 11:59:31PM +0100, Rafael J. Wysocki wrote:
>>> >> >> > On Wednesday, February 04, 2015 01:53:58 PM Paul E. McKenney wrote:
>>> >> >> > > On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki wrote:
>>> >> >> > > > On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote:
>>> >> >
>>> >> > [ . . . ]
>>> >> >
>>> >> >> > > > > [ 1144.482666] Disabling non-boot CPUs ...
>>> >> >> > > > > [ 1144.483000] intel_pstate CPU 1 exiting
>>> >> >> > > > > [ 1144.486064]
>>> >> >> > > > > [ 1144.486065] ===
>>> >> >> > > > > [ 1144.486067] smpboot: CPU 1 didn't die...
>>> >> >> > > > > [ 1144.486067] [ INFO: suspicious RCU usage. ]
>>> >> >> > > > > [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not 
>>> >> >> > > > > tainted
>>> >> >> > > > > [ 1144.486070] ---
>>> >> >> > > > > [ 1144.486072] include/trace/events/tlb.h:35 suspicious
>>> >> >> > > > > rcu_dereference_check() usage!
>>> >> >> > > > > [ 1144.486073]
>>> >> >> > > > > [ 1144.486073] other info that might help us debug this:
>>> >> >> > > > > [ 1144.486073]
>>> >> >> > > > > [ 1144.486074]
>>> >> >> > > > > [ 1144.486074] RCU used illegally from offline CPU!
>>> >> >> > > > > [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0
>>> >> >> > > > > [ 1144.486076] no locks held by swapper/1/0.
>>> >> >> > > > > [ 1144.486076]
>>> >> >> > > > > [ 1144.486076] stack backtrace:
>>> >> >> > > > > [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
>>> >> >> > > > > 3.19.0-rc7-next-20150204.1-iniza-small #1
>>> >> >> > > > > [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
>>> >> >> > > > > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 
>>> >> >> > > > > 03/28/2013
>>> >> >> > > > > [ 1144.486085]  0001 88011a44fe18 
>>> >> >> > > > > 817e370d
>>> >> >> > > > > 0011
>>> >> >> > > > > [ 1144.486088]  88011a448290 88011a44fe48 
>>> >> >> > > > > 810d6847
>>> >> >> > > > > 8800c66b9600
>>> >> >> > > > > [ 1144.486091]  0001 88011a44c000 
>>> >> >> > > > > 81cb3900
>>> >> >> > > > > 88011a44fe78
>>> >> >> > > > > [ 1144.486092] Call Trace:
>>> >> >> > > > > [ 1144.486099]  [] dump_stack+0x4c/0x65
>>> >> >> > > > > [ 1144.486104]  [] 
>>> >> >> > > > > lockdep_rcu_suspicious+0xe7/0x120
>>> >> >> > >
>>> >> >> > > As near as I can tell, idle_task_exit() is running on an offline 
>>> >> >> > > CPU,
>>> >> >> > > then calling switch_mm() which contains trace_tlb_flush(), which 
>>> >> >> > > uses RCU.
>>> >> >> > > And RCU is objecting to being used from a CPU that it is ignoring.
>>> >> >> > >
>>> >> >> > > One approach would be to push RCU's idea of when the CPU goes 
>>> >> >> > > offline
>>> >> >> > > down into arch code in this case, using some Kconfig symbol and
>>> >> >> > > the usual conditional compilation.  Another approach would be to
>>> >> >> > > invoke the trace calls under cpu_online(), for example, for the
>>> >> >> > > first such call in switch_mm():
>>> >> >> > >
>>> >> >> > >   if (cpu_online(smp_processor_id()))
>>> >> >> > >   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 
>>> >> >> > > TLB_FLUSH_ALL);
>>> >> >> > >
>>> >> >> > > The compiler would discard this if tracing was disabled.
>>> >> >> >
>>> >> >> > That looks like less intrusive to me.
>>> >> >>
>>> >> >> One possible concern is increased context-switch path length, but that
>>> >> >> would only be the case where tracing is enabled by default.
>>> >> >
>>> >> > Nevertheless, here is an untested patch.  Does it help?
>>> >>
>>> >> No bedtime :-)
>>> >
>>> > Sorry!  Actually, getting results tomorrow would be plenty OK by me.
>>> >
>>> >> I tried with a revert of...
>>> >>
>>> >> commit 5f1dedac9adb6259bb7b62a923bd7c247a2f2d5b
>>> >> rcu: Handle outgoing CPUs on exit from idle loop
>>> >>
>>> >> ...and offlining cpu1 seems not to produce the trace...
>>> >
>>> > As expected.  The trace can still appear, but the outgoing CPU needs to
>>> > be delayed by at least one jiffy on its final pass through the idle loop.
>>> > Which can really happen in virtualized environments.
>>> >
>>> >> [  115.280244] PPP BSD Compression module registered
>>> >> [  115.288761] PPP Deflate Compression module registered
>>> >> [  162.935524] intel_pstate CPU 1 exiting
>>> >> [  162.949729] smpboot: CPU 1 is now offline
>>> >>
>>> >> Will try the patch.
>>> >
>>> > Looking forward to seeing the results!
>>> >
>>> >

Re: linux-next: Tree for Feb 4

On Thu, Feb 5, 2015 at 2:51 AM, Paul E. McKenney
 wrote:
> On Thu, Feb 05, 2015 at 02:18:01AM +0100, Sedat Dilek wrote:
>> On Thu, Feb 5, 2015 at 1:57 AM, Paul E. McKenney
>>  wrote:
>> > On Thu, Feb 05, 2015 at 01:30:45AM +0100, Sedat Dilek wrote:
>> >> On Thu, Feb 5, 2015 at 1:10 AM, Paul E. McKenney
>> >>  wrote:
>> >> > On Wed, Feb 04, 2015 at 03:51:15PM -0800, Paul E. McKenney wrote:
>> >> >> On Wed, Feb 04, 2015 at 11:59:31PM +0100, Rafael J. Wysocki wrote:
>> >> >> > On Wednesday, February 04, 2015 01:53:58 PM Paul E. McKenney wrote:
>> >> >> > > On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki wrote:
>> >> >> > > > On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote:
>> >> >
>> >> > [ . . . ]
>> >> >
>> >> >> > > > > [ 1144.482666] Disabling non-boot CPUs ...
>> >> >> > > > > [ 1144.483000] intel_pstate CPU 1 exiting
>> >> >> > > > > [ 1144.486064]
>> >> >> > > > > [ 1144.486065] ===
>> >> >> > > > > [ 1144.486067] smpboot: CPU 1 didn't die...
>> >> >> > > > > [ 1144.486067] [ INFO: suspicious RCU usage. ]
>> >> >> > > > > [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not 
>> >> >> > > > > tainted
>> >> >> > > > > [ 1144.486070] ---
>> >> >> > > > > [ 1144.486072] include/trace/events/tlb.h:35 suspicious
>> >> >> > > > > rcu_dereference_check() usage!
>> >> >> > > > > [ 1144.486073]
>> >> >> > > > > [ 1144.486073] other info that might help us debug this:
>> >> >> > > > > [ 1144.486073]
>> >> >> > > > > [ 1144.486074]
>> >> >> > > > > [ 1144.486074] RCU used illegally from offline CPU!
>> >> >> > > > > [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0
>> >> >> > > > > [ 1144.486076] no locks held by swapper/1/0.
>> >> >> > > > > [ 1144.486076]
>> >> >> > > > > [ 1144.486076] stack backtrace:
>> >> >> > > > > [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
>> >> >> > > > > 3.19.0-rc7-next-20150204.1-iniza-small #1
>> >> >> > > > > [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
>> >> >> > > > > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 
>> >> >> > > > > 03/28/2013
>> >> >> > > > > [ 1144.486085]  0001 88011a44fe18 
>> >> >> > > > > 817e370d
>> >> >> > > > > 0011
>> >> >> > > > > [ 1144.486088]  88011a448290 88011a44fe48 
>> >> >> > > > > 810d6847
>> >> >> > > > > 8800c66b9600
>> >> >> > > > > [ 1144.486091]  0001 88011a44c000 
>> >> >> > > > > 81cb3900
>> >> >> > > > > 88011a44fe78
>> >> >> > > > > [ 1144.486092] Call Trace:
>> >> >> > > > > [ 1144.486099]  [] dump_stack+0x4c/0x65
>> >> >> > > > > [ 1144.486104]  [] 
>> >> >> > > > > lockdep_rcu_suspicious+0xe7/0x120
>> >> >> > >
>> >> >> > > As near as I can tell, idle_task_exit() is running on an offline 
>> >> >> > > CPU,
>> >> >> > > then calling switch_mm() which contains trace_tlb_flush(), which 
>> >> >> > > uses RCU.
>> >> >> > > And RCU is objecting to being used from a CPU that it is ignoring.
>> >> >> > >
>> >> >> > > One approach would be to push RCU's idea of when the CPU goes 
>> >> >> > > offline
>> >> >> > > down into arch code in this case, using some Kconfig symbol and
>> >> >> > > the usual conditional compilation.  Another approach would be to
>> >> >> > > invoke the trace calls under cpu_online(), for example, for the
>> >> >> > > first such call in switch_mm():
>> >> >> > >
>> >> >> > >   if (cpu_online(smp_processor_id()))
>> >> >> > >   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
>> >> >> > >
>> >> >> > > The compiler would discard this if tracing was disabled.
>> >> >> >
>> >> >> > That looks like less intrusive to me.
>> >> >>
>> >> >> One possible concern is increased context-switch path length, but that
>> >> >> would only be the case where tracing is enabled by default.
>> >> >
>> >> > Nevertheless, here is an untested patch.  Does it help?
>> >>
>> >> No bedtime :-)
>> >
>> > Sorry!  Actually, getting results tomorrow would be plenty OK by me.
>> >
>> >> I tried with a revert of...
>> >>
>> >> commit 5f1dedac9adb6259bb7b62a923bd7c247a2f2d5b
>> >> rcu: Handle outgoing CPUs on exit from idle loop
>> >>
>> >> ...and offlining cpu1 seems not to produce the trace...
>> >
>> > As expected.  The trace can still appear, but the outgoing CPU needs to
>> > be delayed by at least one jiffy on its final pass through the idle loop.
>> > Which can really happen in virtualized environments.
>> >
>> >> [  115.280244] PPP BSD Compression module registered
>> >> [  115.288761] PPP Deflate Compression module registered
>> >> [  162.935524] intel_pstate CPU 1 exiting
>> >> [  162.949729] smpboot: CPU 1 is now offline
>> >>
>> >> Will try the patch.
>> >
>> > Looking forward to seeing the results!
>> >
>> > Thanx, Paul
>> >
>> >> - Sedat -
>> >>
>> >> >
>> >> > Thanx, Paul
>> >> >
>> >> >

Re: linux-next: Tree for Feb 4

On Thu, Feb 05, 2015 at 02:18:01AM +0100, Sedat Dilek wrote:
> On Thu, Feb 5, 2015 at 1:57 AM, Paul E. McKenney
>  wrote:
> > On Thu, Feb 05, 2015 at 01:30:45AM +0100, Sedat Dilek wrote:
> >> On Thu, Feb 5, 2015 at 1:10 AM, Paul E. McKenney
> >>  wrote:
> >> > On Wed, Feb 04, 2015 at 03:51:15PM -0800, Paul E. McKenney wrote:
> >> >> On Wed, Feb 04, 2015 at 11:59:31PM +0100, Rafael J. Wysocki wrote:
> >> >> > On Wednesday, February 04, 2015 01:53:58 PM Paul E. McKenney wrote:
> >> >> > > On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki wrote:
> >> >> > > > On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote:
> >> >
> >> > [ . . . ]
> >> >
> >> >> > > > > [ 1144.482666] Disabling non-boot CPUs ...
> >> >> > > > > [ 1144.483000] intel_pstate CPU 1 exiting
> >> >> > > > > [ 1144.486064]
> >> >> > > > > [ 1144.486065] ===
> >> >> > > > > [ 1144.486067] smpboot: CPU 1 didn't die...
> >> >> > > > > [ 1144.486067] [ INFO: suspicious RCU usage. ]
> >> >> > > > > [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not 
> >> >> > > > > tainted
> >> >> > > > > [ 1144.486070] ---
> >> >> > > > > [ 1144.486072] include/trace/events/tlb.h:35 suspicious
> >> >> > > > > rcu_dereference_check() usage!
> >> >> > > > > [ 1144.486073]
> >> >> > > > > [ 1144.486073] other info that might help us debug this:
> >> >> > > > > [ 1144.486073]
> >> >> > > > > [ 1144.486074]
> >> >> > > > > [ 1144.486074] RCU used illegally from offline CPU!
> >> >> > > > > [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0
> >> >> > > > > [ 1144.486076] no locks held by swapper/1/0.
> >> >> > > > > [ 1144.486076]
> >> >> > > > > [ 1144.486076] stack backtrace:
> >> >> > > > > [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
> >> >> > > > > 3.19.0-rc7-next-20150204.1-iniza-small #1
> >> >> > > > > [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
> >> >> > > > > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 
> >> >> > > > > 03/28/2013
> >> >> > > > > [ 1144.486085]  0001 88011a44fe18 
> >> >> > > > > 817e370d
> >> >> > > > > 0011
> >> >> > > > > [ 1144.486088]  88011a448290 88011a44fe48 
> >> >> > > > > 810d6847
> >> >> > > > > 8800c66b9600
> >> >> > > > > [ 1144.486091]  0001 88011a44c000 
> >> >> > > > > 81cb3900
> >> >> > > > > 88011a44fe78
> >> >> > > > > [ 1144.486092] Call Trace:
> >> >> > > > > [ 1144.486099]  [] dump_stack+0x4c/0x65
> >> >> > > > > [ 1144.486104]  [] 
> >> >> > > > > lockdep_rcu_suspicious+0xe7/0x120
> >> >> > >
> >> >> > > As near as I can tell, idle_task_exit() is running on an offline 
> >> >> > > CPU,
> >> >> > > then calling switch_mm() which contains trace_tlb_flush(), which 
> >> >> > > uses RCU.
> >> >> > > And RCU is objecting to being used from a CPU that it is ignoring.
> >> >> > >
> >> >> > > One approach would be to push RCU's idea of when the CPU goes 
> >> >> > > offline
> >> >> > > down into arch code in this case, using some Kconfig symbol and
> >> >> > > the usual conditional compilation.  Another approach would be to
> >> >> > > invoke the trace calls under cpu_online(), for example, for the
> >> >> > > first such call in switch_mm():
> >> >> > >
> >> >> > >   if (cpu_online(smp_processor_id()))
> >> >> > >   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
> >> >> > >
> >> >> > > The compiler would discard this if tracing was disabled.
> >> >> >
> >> >> > That looks like less intrusive to me.
> >> >>
> >> >> One possible concern is increased context-switch path length, but that
> >> >> would only be the case where tracing is enabled by default.
> >> >
> >> > Nevertheless, here is an untested patch.  Does it help?
> >>
> >> No bedtime :-)
> >
> > Sorry!  Actually, getting results tomorrow would be plenty OK by me.
> >
> >> I tried with a revert of...
> >>
> >> commit 5f1dedac9adb6259bb7b62a923bd7c247a2f2d5b
> >> rcu: Handle outgoing CPUs on exit from idle loop
> >>
> >> ...and offlining cpu1 seems not to produce the trace...
> >
> > As expected.  The trace can still appear, but the outgoing CPU needs to
> > be delayed by at least one jiffy on its final pass through the idle loop.
> > Which can really happen in virtualized environments.
> >
> >> [  115.280244] PPP BSD Compression module registered
> >> [  115.288761] PPP Deflate Compression module registered
> >> [  162.935524] intel_pstate CPU 1 exiting
> >> [  162.949729] smpboot: CPU 1 is now offline
> >>
> >> Will try the patch.
> >
> > Looking forward to seeing the results!
> >
> > Thanx, Paul
> >
> >> - Sedat -
> >>
> >> >
> >> > Thanx, Paul
> >> >
> >> > 
> >> >
> >> > x86: Omit switch_mm() tracing for offline CPUs
> >> >
> >> > The architecture-specific

Re: linux-next: Tree for Feb 4

On Thu, Feb 5, 2015 at 1:57 AM, Paul E. McKenney
 wrote:
> On Thu, Feb 05, 2015 at 01:30:45AM +0100, Sedat Dilek wrote:
>> On Thu, Feb 5, 2015 at 1:10 AM, Paul E. McKenney
>>  wrote:
>> > On Wed, Feb 04, 2015 at 03:51:15PM -0800, Paul E. McKenney wrote:
>> >> On Wed, Feb 04, 2015 at 11:59:31PM +0100, Rafael J. Wysocki wrote:
>> >> > On Wednesday, February 04, 2015 01:53:58 PM Paul E. McKenney wrote:
>> >> > > On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki wrote:
>> >> > > > On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote:
>> >
>> > [ . . . ]
>> >
>> >> > > > > [ 1144.482666] Disabling non-boot CPUs ...
>> >> > > > > [ 1144.483000] intel_pstate CPU 1 exiting
>> >> > > > > [ 1144.486064]
>> >> > > > > [ 1144.486065] ===
>> >> > > > > [ 1144.486067] smpboot: CPU 1 didn't die...
>> >> > > > > [ 1144.486067] [ INFO: suspicious RCU usage. ]
>> >> > > > > [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not 
>> >> > > > > tainted
>> >> > > > > [ 1144.486070] ---
>> >> > > > > [ 1144.486072] include/trace/events/tlb.h:35 suspicious
>> >> > > > > rcu_dereference_check() usage!
>> >> > > > > [ 1144.486073]
>> >> > > > > [ 1144.486073] other info that might help us debug this:
>> >> > > > > [ 1144.486073]
>> >> > > > > [ 1144.486074]
>> >> > > > > [ 1144.486074] RCU used illegally from offline CPU!
>> >> > > > > [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0
>> >> > > > > [ 1144.486076] no locks held by swapper/1/0.
>> >> > > > > [ 1144.486076]
>> >> > > > > [ 1144.486076] stack backtrace:
>> >> > > > > [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
>> >> > > > > 3.19.0-rc7-next-20150204.1-iniza-small #1
>> >> > > > > [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
>> >> > > > > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 
>> >> > > > > 03/28/2013
>> >> > > > > [ 1144.486085]  0001 88011a44fe18 817e370d
>> >> > > > > 0011
>> >> > > > > [ 1144.486088]  88011a448290 88011a44fe48 810d6847
>> >> > > > > 8800c66b9600
>> >> > > > > [ 1144.486091]  0001 88011a44c000 81cb3900
>> >> > > > > 88011a44fe78
>> >> > > > > [ 1144.486092] Call Trace:
>> >> > > > > [ 1144.486099]  [] dump_stack+0x4c/0x65
>> >> > > > > [ 1144.486104]  [] 
>> >> > > > > lockdep_rcu_suspicious+0xe7/0x120
>> >> > >
>> >> > > As near as I can tell, idle_task_exit() is running on an offline CPU,
>> >> > > then calling switch_mm() which contains trace_tlb_flush(), which uses 
>> >> > > RCU.
>> >> > > And RCU is objecting to being used from a CPU that it is ignoring.
>> >> > >
>> >> > > One approach would be to push RCU's idea of when the CPU goes offline
>> >> > > down into arch code in this case, using some Kconfig symbol and
>> >> > > the usual conditional compilation.  Another approach would be to
>> >> > > invoke the trace calls under cpu_online(), for example, for the
>> >> > > first such call in switch_mm():
>> >> > >
>> >> > >   if (cpu_online(smp_processor_id()))
>> >> > >   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
>> >> > >
>> >> > > The compiler would discard this if tracing was disabled.
>> >> >
>> >> > That looks like less intrusive to me.
>> >>
>> >> One possible concern is increased context-switch path length, but that
>> >> would only be the case where tracing is enabled by default.
>> >
>> > Nevertheless, here is an untested patch.  Does it help?
>>
>> No bedtime :-)
>
> Sorry!  Actually, getting results tomorrow would be plenty OK by me.
>
>> I tried with a revert of...
>>
>> commit 5f1dedac9adb6259bb7b62a923bd7c247a2f2d5b
>> rcu: Handle outgoing CPUs on exit from idle loop
>>
>> ...and offlining cpu1 seems not to produce the trace...
>
> As expected.  The trace can still appear, but the outgoing CPU needs to
> be delayed by at least one jiffy on its final pass through the idle loop.
> Which can really happen in virtualized environments.
>
>> [  115.280244] PPP BSD Compression module registered
>> [  115.288761] PPP Deflate Compression module registered
>> [  162.935524] intel_pstate CPU 1 exiting
>> [  162.949729] smpboot: CPU 1 is now offline
>>
>> Will try the patch.
>
> Looking forward to seeing the results!
>
> Thanx, Paul
>
>> - Sedat -
>>
>> >
>> > Thanx, Paul
>> >
>> > 
>> >
>> > x86: Omit switch_mm() tracing for offline CPUs
>> >
>> > The architecture-specific switch_mm() function can be called by offline
>> > CPUs, but includes event tracing, which cannot be legally carried out
>> > on offline CPUs.  This results in a lockdep-RCU splat.  This commit fixes
>> > this splat by omitting the tracing when the CPU is offline.
>> >
>> > Reported-by: Sedat Dilek 
>> > Signed-off-by: Paul E. McKenney 
>> >
>> > diff

Re: linux-next: Tree for Feb 4

On Thu, Feb 05, 2015 at 01:30:45AM +0100, Sedat Dilek wrote:
> On Thu, Feb 5, 2015 at 1:10 AM, Paul E. McKenney
>  wrote:
> > On Wed, Feb 04, 2015 at 03:51:15PM -0800, Paul E. McKenney wrote:
> >> On Wed, Feb 04, 2015 at 11:59:31PM +0100, Rafael J. Wysocki wrote:
> >> > On Wednesday, February 04, 2015 01:53:58 PM Paul E. McKenney wrote:
> >> > > On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki wrote:
> >> > > > On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote:
> >
> > [ . . . ]
> >
> >> > > > > [ 1144.482666] Disabling non-boot CPUs ...
> >> > > > > [ 1144.483000] intel_pstate CPU 1 exiting
> >> > > > > [ 1144.486064]
> >> > > > > [ 1144.486065] ===
> >> > > > > [ 1144.486067] smpboot: CPU 1 didn't die...
> >> > > > > [ 1144.486067] [ INFO: suspicious RCU usage. ]
> >> > > > > [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not 
> >> > > > > tainted
> >> > > > > [ 1144.486070] ---
> >> > > > > [ 1144.486072] include/trace/events/tlb.h:35 suspicious
> >> > > > > rcu_dereference_check() usage!
> >> > > > > [ 1144.486073]
> >> > > > > [ 1144.486073] other info that might help us debug this:
> >> > > > > [ 1144.486073]
> >> > > > > [ 1144.486074]
> >> > > > > [ 1144.486074] RCU used illegally from offline CPU!
> >> > > > > [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0
> >> > > > > [ 1144.486076] no locks held by swapper/1/0.
> >> > > > > [ 1144.486076]
> >> > > > > [ 1144.486076] stack backtrace:
> >> > > > > [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
> >> > > > > 3.19.0-rc7-next-20150204.1-iniza-small #1
> >> > > > > [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
> >> > > > > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 
> >> > > > > 03/28/2013
> >> > > > > [ 1144.486085]  0001 88011a44fe18 817e370d
> >> > > > > 0011
> >> > > > > [ 1144.486088]  88011a448290 88011a44fe48 810d6847
> >> > > > > 8800c66b9600
> >> > > > > [ 1144.486091]  0001 88011a44c000 81cb3900
> >> > > > > 88011a44fe78
> >> > > > > [ 1144.486092] Call Trace:
> >> > > > > [ 1144.486099]  [] dump_stack+0x4c/0x65
> >> > > > > [ 1144.486104]  [] 
> >> > > > > lockdep_rcu_suspicious+0xe7/0x120
> >> > >
> >> > > As near as I can tell, idle_task_exit() is running on an offline CPU,
> >> > > then calling switch_mm() which contains trace_tlb_flush(), which uses 
> >> > > RCU.
> >> > > And RCU is objecting to being used from a CPU that it is ignoring.
> >> > >
> >> > > One approach would be to push RCU's idea of when the CPU goes offline
> >> > > down into arch code in this case, using some Kconfig symbol and
> >> > > the usual conditional compilation.  Another approach would be to
> >> > > invoke the trace calls under cpu_online(), for example, for the
> >> > > first such call in switch_mm():
> >> > >
> >> > >   if (cpu_online(smp_processor_id()))
> >> > >   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
> >> > >
> >> > > The compiler would discard this if tracing was disabled.
> >> >
> >> > That looks like less intrusive to me.
> >>
> >> One possible concern is increased context-switch path length, but that
> >> would only be the case where tracing is enabled by default.
> >
> > Nevertheless, here is an untested patch.  Does it help?
> 
> No bedtime :-)

Sorry!  Actually, getting results tomorrow would be plenty OK by me.

> I tried with a revert of...
> 
> commit 5f1dedac9adb6259bb7b62a923bd7c247a2f2d5b
> rcu: Handle outgoing CPUs on exit from idle loop
> 
> ...and offlining cpu1 seems not to produce the trace...

As expected.  The trace can still appear, but the outgoing CPU needs to
be delayed by at least one jiffy on its final pass through the idle loop.
Which can really happen in virtualized environments.

> [  115.280244] PPP BSD Compression module registered
> [  115.288761] PPP Deflate Compression module registered
> [  162.935524] intel_pstate CPU 1 exiting
> [  162.949729] smpboot: CPU 1 is now offline
> 
> Will try the patch.

Looking forward to seeing the results!

Thanx, Paul

> - Sedat -
> 
> >
> > Thanx, Paul
> >
> > 
> >
> > x86: Omit switch_mm() tracing for offline CPUs
> >
> > The architecture-specific switch_mm() function can be called by offline
> > CPUs, but includes event tracing, which cannot be legally carried out
> > on offline CPUs.  This results in a lockdep-RCU splat.  This commit fixes
> > this splat by omitting the tracing when the CPU is offline.
> >
> > Reported-by: Sedat Dilek 
> > Signed-off-by: Paul E. McKenney 
> >
> > diff --git a/arch/x86/include/asm/mmu_context.h 
> > b/arch/x86/include/asm/mmu_context.h
> > index 40269a2bf6f9..7e7f2445fbc9 100644
> > --- a/arch/x86/include/asm/mmu_context.h

Re: linux-next: Tree for Feb 4

On Thu, Feb 5, 2015 at 1:10 AM, Paul E. McKenney
 wrote:
> On Wed, Feb 04, 2015 at 03:51:15PM -0800, Paul E. McKenney wrote:
>> On Wed, Feb 04, 2015 at 11:59:31PM +0100, Rafael J. Wysocki wrote:
>> > On Wednesday, February 04, 2015 01:53:58 PM Paul E. McKenney wrote:
>> > > On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki wrote:
>> > > > On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote:
>
> [ . . . ]
>
>> > > > > [ 1144.482666] Disabling non-boot CPUs ...
>> > > > > [ 1144.483000] intel_pstate CPU 1 exiting
>> > > > > [ 1144.486064]
>> > > > > [ 1144.486065] ===
>> > > > > [ 1144.486067] smpboot: CPU 1 didn't die...
>> > > > > [ 1144.486067] [ INFO: suspicious RCU usage. ]
>> > > > > [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted
>> > > > > [ 1144.486070] ---
>> > > > > [ 1144.486072] include/trace/events/tlb.h:35 suspicious
>> > > > > rcu_dereference_check() usage!
>> > > > > [ 1144.486073]
>> > > > > [ 1144.486073] other info that might help us debug this:
>> > > > > [ 1144.486073]
>> > > > > [ 1144.486074]
>> > > > > [ 1144.486074] RCU used illegally from offline CPU!
>> > > > > [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0
>> > > > > [ 1144.486076] no locks held by swapper/1/0.
>> > > > > [ 1144.486076]
>> > > > > [ 1144.486076] stack backtrace:
>> > > > > [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
>> > > > > 3.19.0-rc7-next-20150204.1-iniza-small #1
>> > > > > [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
>> > > > > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
>> > > > > [ 1144.486085]  0001 88011a44fe18 817e370d
>> > > > > 0011
>> > > > > [ 1144.486088]  88011a448290 88011a44fe48 810d6847
>> > > > > 8800c66b9600
>> > > > > [ 1144.486091]  0001 88011a44c000 81cb3900
>> > > > > 88011a44fe78
>> > > > > [ 1144.486092] Call Trace:
>> > > > > [ 1144.486099]  [] dump_stack+0x4c/0x65
>> > > > > [ 1144.486104]  [] 
>> > > > > lockdep_rcu_suspicious+0xe7/0x120
>> > >
>> > > As near as I can tell, idle_task_exit() is running on an offline CPU,
>> > > then calling switch_mm() which contains trace_tlb_flush(), which uses 
>> > > RCU.
>> > > And RCU is objecting to being used from a CPU that it is ignoring.
>> > >
>> > > One approach would be to push RCU's idea of when the CPU goes offline
>> > > down into arch code in this case, using some Kconfig symbol and
>> > > the usual conditional compilation.  Another approach would be to
>> > > invoke the trace calls under cpu_online(), for example, for the
>> > > first such call in switch_mm():
>> > >
>> > >   if (cpu_online(smp_processor_id()))
>> > >   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
>> > >
>> > > The compiler would discard this if tracing was disabled.
>> >
>> > That looks like less intrusive to me.
>>
>> One possible concern is increased context-switch path length, but that
>> would only be the case where tracing is enabled by default.
>
> Nevertheless, here is an untested patch.  Does it help?

No bedtime :-)

I tried with a revert of...

commit 5f1dedac9adb6259bb7b62a923bd7c247a2f2d5b
rcu: Handle outgoing CPUs on exit from idle loop

...and offlining cpu1 seems not to produce the trace...

[  115.280244] PPP BSD Compression module registered
[  115.288761] PPP Deflate Compression module registered
[  162.935524] intel_pstate CPU 1 exiting
[  162.949729] smpboot: CPU 1 is now offline

Will try the patch.

- Sedat -

>
> Thanx, Paul
>
> 
>
> x86: Omit switch_mm() tracing for offline CPUs
>
> The architecture-specific switch_mm() function can be called by offline
> CPUs, but includes event tracing, which cannot be legally carried out
> on offline CPUs.  This results in a lockdep-RCU splat.  This commit fixes
> this splat by omitting the tracing when the CPU is offline.
>
> Reported-by: Sedat Dilek 
> Signed-off-by: Paul E. McKenney 
>
> diff --git a/arch/x86/include/asm/mmu_context.h 
> b/arch/x86/include/asm/mmu_context.h
> index 40269a2bf6f9..7e7f2445fbc9 100644
> --- a/arch/x86/include/asm/mmu_context.h
> +++ b/arch/x86/include/asm/mmu_context.h
> @@ -47,7 +47,8 @@ static inline void switch_mm(struct mm_struct *prev, struct 
> mm_struct *next,
>
> /* Re-load page tables */
> load_cr3(next->pgd);
> -   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
> +   if (cpu_online(smp_processor_id()))
> +   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 
> TLB_FLUSH_ALL);
>
> /* Stop flush ipis for the previous mm */
> cpumask_clear_cpu(cpu, mm_cpumask(prev));
> @@ -84,7 +85,8 @@ static inline void switch_mm(struct mm_struct *prev, struct 
> mm_struct *next,
>

Re: linux-next: Tree for Feb 4

On Wed, Feb 04, 2015 at 03:51:15PM -0800, Paul E. McKenney wrote:
> On Wed, Feb 04, 2015 at 11:59:31PM +0100, Rafael J. Wysocki wrote:
> > On Wednesday, February 04, 2015 01:53:58 PM Paul E. McKenney wrote:
> > > On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki wrote:
> > > > On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote:

[ . . . ]

> > > > > [ 1144.482666] Disabling non-boot CPUs ...
> > > > > [ 1144.483000] intel_pstate CPU 1 exiting
> > > > > [ 1144.486064]
> > > > > [ 1144.486065] ===
> > > > > [ 1144.486067] smpboot: CPU 1 didn't die...
> > > > > [ 1144.486067] [ INFO: suspicious RCU usage. ]
> > > > > [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted
> > > > > [ 1144.486070] ---
> > > > > [ 1144.486072] include/trace/events/tlb.h:35 suspicious
> > > > > rcu_dereference_check() usage!
> > > > > [ 1144.486073]
> > > > > [ 1144.486073] other info that might help us debug this:
> > > > > [ 1144.486073]
> > > > > [ 1144.486074]
> > > > > [ 1144.486074] RCU used illegally from offline CPU!
> > > > > [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0
> > > > > [ 1144.486076] no locks held by swapper/1/0.
> > > > > [ 1144.486076]
> > > > > [ 1144.486076] stack backtrace:
> > > > > [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
> > > > > 3.19.0-rc7-next-20150204.1-iniza-small #1
> > > > > [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
> > > > > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
> > > > > [ 1144.486085]  0001 88011a44fe18 817e370d
> > > > > 0011
> > > > > [ 1144.486088]  88011a448290 88011a44fe48 810d6847
> > > > > 8800c66b9600
> > > > > [ 1144.486091]  0001 88011a44c000 81cb3900
> > > > > 88011a44fe78
> > > > > [ 1144.486092] Call Trace:
> > > > > [ 1144.486099]  [] dump_stack+0x4c/0x65
> > > > > [ 1144.486104]  [] lockdep_rcu_suspicious+0xe7/0x120
> > > 
> > > As near as I can tell, idle_task_exit() is running on an offline CPU,
> > > then calling switch_mm() which contains trace_tlb_flush(), which uses RCU.
> > > And RCU is objecting to being used from a CPU that it is ignoring.
> > > 
> > > One approach would be to push RCU's idea of when the CPU goes offline
> > > down into arch code in this case, using some Kconfig symbol and
> > > the usual conditional compilation.  Another approach would be to
> > > invoke the trace calls under cpu_online(), for example, for the
> > > first such call in switch_mm():
> > > 
> > >   if (cpu_online(smp_processor_id()))
> > >   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
> > > 
> > > The compiler would discard this if tracing was disabled.
> > 
> > That looks like less intrusive to me.
> 
> One possible concern is increased context-switch path length, but that
> would only be the case where tracing is enabled by default.

Nevertheless, here is an untested patch.  Does it help?

Thanx, Paul



x86: Omit switch_mm() tracing for offline CPUs

The architecture-specific switch_mm() function can be called by offline
CPUs, but includes event tracing, which cannot be legally carried out
on offline CPUs.  This results in a lockdep-RCU splat.  This commit fixes
this splat by omitting the tracing when the CPU is offline.

Reported-by: Sedat Dilek 
Signed-off-by: Paul E. McKenney 

diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index 40269a2bf6f9..7e7f2445fbc9 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -47,7 +47,8 @@ static inline void switch_mm(struct mm_struct *prev, struct 
mm_struct *next,
 
/* Re-load page tables */
load_cr3(next->pgd);
-   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
+   if (cpu_online(smp_processor_id()))
+   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 
TLB_FLUSH_ALL);
 
/* Stop flush ipis for the previous mm */
cpumask_clear_cpu(cpu, mm_cpumask(prev));
@@ -84,7 +85,8 @@ static inline void switch_mm(struct mm_struct *prev, struct 
mm_struct *next,
 * to make sure to use no freed page tables.
 */
load_cr3(next->pgd);
-   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 
TLB_FLUSH_ALL);
+   if (cpu_online(smp_processor_id()))
+   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 
TLB_FLUSH_ALL);
load_LDT_nolock(>context);
}
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at

Re: linux-next: Tree for Feb 4

On Thu, Feb 5, 2015 at 12:51 AM, Paul E. McKenney
 wrote:
> On Wed, Feb 04, 2015 at 11:59:31PM +0100, Rafael J. Wysocki wrote:
>> On Wednesday, February 04, 2015 01:53:58 PM Paul E. McKenney wrote:
>> > On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki wrote:
>> > > On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote:
>> > > > On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell 
>> > > >  wrote:
>> > > > > Hi all,
>> > > > >
>> > > > > The next release I will be making will be next-20150209 - which will
>> > > > > probably be after the v3.19 release.
>> > > > >
>> > > > > Changes since 20150203:
>> > > > >
>> > > > > The sound-asoc tree gained a conflict against the sound tree.
>> > > > >
>> > > > > The scsi tree gained a build failure caused by an interaction with 
>> > > > > the
>> > > > > driver-core tree.  I applied a merge fix patch.
>> > > > >
>> > > > > The akpm-current tree gained a build failure for which I disabled
>> > > > > CONFIG_KASAN.
>> > > > >
>> > > > > Non-merge commits (relative to Linus' tree): 7461
>> > > > >  7314 files changed, 309736 insertions(+), 172363 deletions(-)
>> > > > >
>> > > > > 
>> > > > >
>> > > >
>> > > > [ CC linux-rcu | linux-pm | intel_pstate maintainers ]
>> > >
>> > > Dirk is not the maintainer of intel_pstate any more, CC: Kristen.
>> > >
>> > > > Hi,
>> > > >
>> > > > after suspend-and-resume I see the following call-trace:
>> > >
>> > > Do you see that after CPU1 offline too?
>> > >
>> > > > ...
>> > > > [ 1144.482666] Disabling non-boot CPUs ...
>> > > > [ 1144.483000] intel_pstate CPU 1 exiting
>> > > > [ 1144.486064]
>> > > > [ 1144.486065] ===
>> > > > [ 1144.486067] smpboot: CPU 1 didn't die...
>> > > > [ 1144.486067] [ INFO: suspicious RCU usage. ]
>> > > > [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted
>> > > > [ 1144.486070] ---
>> > > > [ 1144.486072] include/trace/events/tlb.h:35 suspicious
>> > > > rcu_dereference_check() usage!
>> > > > [ 1144.486073]
>> > > > [ 1144.486073] other info that might help us debug this:
>> > > > [ 1144.486073]
>> > > > [ 1144.486074]
>> > > > [ 1144.486074] RCU used illegally from offline CPU!
>> > > > [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0
>> > > > [ 1144.486076] no locks held by swapper/1/0.
>> > > > [ 1144.486076]
>> > > > [ 1144.486076] stack backtrace:
>> > > > [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
>> > > > 3.19.0-rc7-next-20150204.1-iniza-small #1
>> > > > [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
>> > > > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
>> > > > [ 1144.486085]  0001 88011a44fe18 817e370d
>> > > > 0011
>> > > > [ 1144.486088]  88011a448290 88011a44fe48 810d6847
>> > > > 8800c66b9600
>> > > > [ 1144.486091]  0001 88011a44c000 81cb3900
>> > > > 88011a44fe78
>> > > > [ 1144.486092] Call Trace:
>> > > > [ 1144.486099]  [] dump_stack+0x4c/0x65
>> > > > [ 1144.486104]  [] lockdep_rcu_suspicious+0xe7/0x120
>> >
>> > As near as I can tell, idle_task_exit() is running on an offline CPU,
>> > then calling switch_mm() which contains trace_tlb_flush(), which uses RCU.
>> > And RCU is objecting to being used from a CPU that it is ignoring.
>> >
>> > One approach would be to push RCU's idea of when the CPU goes offline
>> > down into arch code in this case, using some Kconfig symbol and
>> > the usual conditional compilation.  Another approach would be to
>> > invoke the trace calls under cpu_online(), for example, for the
>> > first such call in switch_mm():
>> >
>> > if (cpu_online(smp_processor_id()))
>> > trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
>> >
>> > The compiler would discard this if tracing was disabled.
>>
>> That looks like less intrusive to me.
>
> One possible concern is increased context-switch path length, but that
> would only be the case where tracing is enabled by default.
>

Hmmm, which kernel-config "trace" options do you mean in particular?

>> > Other thoughts?
>>
>> Well, the whole issue here seems to be that common code using RCU is also
>> useful in places where RCU doesn't want to be used.  Arguably, we can deal
>> with all of those cases in a whack-a-mole manner, but that doesn't seem to
>> scale too well.
>
> Well, I did put a change into -next that makes these particular moles
> stick their heads up farther, so this is not a random event.  And in
> this particular case, we do have the option of extending RCU's reach to
> cover this operation, at the expense of a bit more intrusion by RCU into
> arch-specific code.  If tracing is enabled by default by major distros,
> that might be the right thing to do, unappealing though it might be.
>

Can you point me to that change in rcu-next?

> But yes, it would have been far

Re: linux-next: Tree for Feb 4

On Thu, Feb 5, 2015 at 12:25 AM, Rafael J. Wysocki  wrote:
> On Wednesday, February 04, 2015 11:38:40 PM Sedat Dilek wrote:
>> On Wed, Feb 4, 2015 at 10:54 PM, Rafael J. Wysocki  
>> wrote:
>> > On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote:
>> >> On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell  
>> >> wrote:
>> >> > Hi all,
>> >> >
>> >> > The next release I will be making will be next-20150209 - which will
>> >> > probably be after the v3.19 release.
>> >> >
>> >> > Changes since 20150203:
>> >> >
>> >> > The sound-asoc tree gained a conflict against the sound tree.
>> >> >
>> >> > The scsi tree gained a build failure caused by an interaction with the
>> >> > driver-core tree.  I applied a merge fix patch.
>> >> >
>> >> > The akpm-current tree gained a build failure for which I disabled
>> >> > CONFIG_KASAN.
>> >> >
>> >> > Non-merge commits (relative to Linus' tree): 7461
>> >> >  7314 files changed, 309736 insertions(+), 172363 deletions(-)
>> >> >
>> >> > 
>> >> >
>> >>
>> >> [ CC linux-rcu | linux-pm | intel_pstate maintainers ]
>> >
>> > Dirk is not the maintainer of intel_pstate any more, CC: Kristen.
>> >
>>
>> Yupp, I forwarded my original posting before you answered me.
>>
>> >> Hi,
>> >>
>> >> after suspend-and-resume I see the following call-trace:
>> >
>> > Do you see that after CPU1 offline too?
>> >
>>
>> Did not check yet.
>>
>> >> ...
>> >> [ 1144.482666] Disabling non-boot CPUs ...
>> >> [ 1144.483000] intel_pstate CPU 1 exiting
>> >> [ 1144.486064]
>> >> [ 1144.486065] ===
>> >> [ 1144.486067] smpboot: CPU 1 didn't die...
>> >> [ 1144.486067] [ INFO: suspicious RCU usage. ]
>> >> [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted
>> >> [ 1144.486070] ---
>> >> [ 1144.486072] include/trace/events/tlb.h:35 suspicious
>> >> rcu_dereference_check() usage!
>> >> [ 1144.486073]
>> >> [ 1144.486073] other info that might help us debug this:
>> >> [ 1144.486073]
>> >> [ 1144.486074]
>> >> [ 1144.486074] RCU used illegally from offline CPU!
>> >> [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0
>> >> [ 1144.486076] no locks held by swapper/1/0.
>> >> [ 1144.486076]
>> >> [ 1144.486076] stack backtrace:
>> >> [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
>> >> 3.19.0-rc7-next-20150204.1-iniza-small #1
>> >> [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
>> >> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
>> >> [ 1144.486085]  0001 88011a44fe18 817e370d
>> >> 0011
>> >> [ 1144.486088]  88011a448290 88011a44fe48 810d6847
>> >> 8800c66b9600
>> >> [ 1144.486091]  0001 88011a44c000 81cb3900
>> >> 88011a44fe78
>> >> [ 1144.486092] Call Trace:
>> >> [ 1144.486099]  [] dump_stack+0x4c/0x65
>> >> [ 1144.486104]  [] lockdep_rcu_suspicious+0xe7/0x120
>> >> [ 1144.486109]  [] idle_task_exit+0x205/0x2c0
>> >> [ 1144.486113]  [] play_dead_common+0xe/0x50
>> >> [ 1144.486116]  [] native_play_dead+0x15/0x140
>> >> [ 1144.486121]  [] arch_cpu_idle_dead+0xf/0x20
>> >> [ 1144.486123]  [] cpu_startup_entry+0x37e/0x580
>> >> [ 1144.486126]  [] start_secondary+0x140/0x150
>> >> [ 1144.502920] intel_pstate CPU 2 exiting
>> >> ...
>> >>
>> >> Not sure if this comes from the rcu or pm/intel_pstate area.
>> >
>> > New intel_pstate commits in linux-next are between 7ab0256e57ae and
>> > a04759924e25 inclusive.  Please check that range first.
>> >
>>
>> Not sure if I am willing to test with reverted patches.
>> ( /me was updating Linux graphic driver stack today built with
>> upcomming llvm-toolchain v3.6.0. )
>>
>> > If that doesn't point you to the offender, you can pull the linux-next
>> > branch of the linux-pm.git tree at:
>> >
>> > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git 
>> > linux-next
>> >
>> > and see if that alone triggers the issue for you.  If not, the offender is
>> > not there.  Otherwise, and if you use the ACPI cpuidle driver, you can
>> > check the acpi-processor merge point too.
>> >
>>
>> I pulled in pm-next-20150204 on top of next-20150204, but that did not help.
>
> What I was asking about was to test linux-pm.git/linux-next *instead* *of* 
> full
> linux-next and not on top of it.  That would tell you whether or not the new 
> trace
> was introduced by one of the PM commits or elsewhere.
>

No, I did not test this.

> But this most likely is what Paul said anyway.
>

Not sure what you mean by this statement.

I tried -3 kernel with...

f64b348810c2 Revert "intel_pstate: Add support for SkyLake"
a0d825a39848 Revert "intel_pstate: expose turbo range to sysfs"
847153608ecf Revert "intel_pstate: Add num_pstates to sysfs"
412a6770cde4 Revert "intel_pstate: respect cpufreq policy request"
e2a6685023ed Revert "intel_pstate: honor user space min_perf_pct
override on resume"

Re: linux-next: Tree for Feb 4

On Wed, Feb 04, 2015 at 11:59:31PM +0100, Rafael J. Wysocki wrote:
> On Wednesday, February 04, 2015 01:53:58 PM Paul E. McKenney wrote:
> > On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki wrote:
> > > On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote:
> > > > On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell 
> > > >  wrote:
> > > > > Hi all,
> > > > >
> > > > > The next release I will be making will be next-20150209 - which will
> > > > > probably be after the v3.19 release.
> > > > >
> > > > > Changes since 20150203:
> > > > >
> > > > > The sound-asoc tree gained a conflict against the sound tree.
> > > > >
> > > > > The scsi tree gained a build failure caused by an interaction with the
> > > > > driver-core tree.  I applied a merge fix patch.
> > > > >
> > > > > The akpm-current tree gained a build failure for which I disabled
> > > > > CONFIG_KASAN.
> > > > >
> > > > > Non-merge commits (relative to Linus' tree): 7461
> > > > >  7314 files changed, 309736 insertions(+), 172363 deletions(-)
> > > > >
> > > > > 
> > > > >
> > > > 
> > > > [ CC linux-rcu | linux-pm | intel_pstate maintainers ]
> > > 
> > > Dirk is not the maintainer of intel_pstate any more, CC: Kristen.
> > > 
> > > > Hi,
> > > > 
> > > > after suspend-and-resume I see the following call-trace:
> > > 
> > > Do you see that after CPU1 offline too? 
> > > 
> > > > ...
> > > > [ 1144.482666] Disabling non-boot CPUs ...
> > > > [ 1144.483000] intel_pstate CPU 1 exiting
> > > > [ 1144.486064]
> > > > [ 1144.486065] ===
> > > > [ 1144.486067] smpboot: CPU 1 didn't die...
> > > > [ 1144.486067] [ INFO: suspicious RCU usage. ]
> > > > [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted
> > > > [ 1144.486070] ---
> > > > [ 1144.486072] include/trace/events/tlb.h:35 suspicious
> > > > rcu_dereference_check() usage!
> > > > [ 1144.486073]
> > > > [ 1144.486073] other info that might help us debug this:
> > > > [ 1144.486073]
> > > > [ 1144.486074]
> > > > [ 1144.486074] RCU used illegally from offline CPU!
> > > > [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0
> > > > [ 1144.486076] no locks held by swapper/1/0.
> > > > [ 1144.486076]
> > > > [ 1144.486076] stack backtrace:
> > > > [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
> > > > 3.19.0-rc7-next-20150204.1-iniza-small #1
> > > > [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
> > > > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
> > > > [ 1144.486085]  0001 88011a44fe18 817e370d
> > > > 0011
> > > > [ 1144.486088]  88011a448290 88011a44fe48 810d6847
> > > > 8800c66b9600
> > > > [ 1144.486091]  0001 88011a44c000 81cb3900
> > > > 88011a44fe78
> > > > [ 1144.486092] Call Trace:
> > > > [ 1144.486099]  [] dump_stack+0x4c/0x65
> > > > [ 1144.486104]  [] lockdep_rcu_suspicious+0xe7/0x120
> > 
> > As near as I can tell, idle_task_exit() is running on an offline CPU,
> > then calling switch_mm() which contains trace_tlb_flush(), which uses RCU.
> > And RCU is objecting to being used from a CPU that it is ignoring.
> > 
> > One approach would be to push RCU's idea of when the CPU goes offline
> > down into arch code in this case, using some Kconfig symbol and
> > the usual conditional compilation.  Another approach would be to
> > invoke the trace calls under cpu_online(), for example, for the
> > first such call in switch_mm():
> > 
> > if (cpu_online(smp_processor_id()))
> > trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
> > 
> > The compiler would discard this if tracing was disabled.
> 
> That looks like less intrusive to me.

One possible concern is increased context-switch path length, but that
would only be the case where tracing is enabled by default.

> > Other thoughts?
> 
> Well, the whole issue here seems to be that common code using RCU is also
> useful in places where RCU doesn't want to be used.  Arguably, we can deal
> with all of those cases in a whack-a-mole manner, but that doesn't seem to
> scale too well.

Well, I did put a change into -next that makes these particular moles
stick their heads up farther, so this is not a random event.  And in
this particular case, we do have the option of extending RCU's reach to
cover this operation, at the expense of a bit more intrusion by RCU into
arch-specific code.  If tracing is enabled by default by major distros,
that might be the right thing to do, unappealing though it might be.

But yes, it would have been far better for RCU to have been picky to
begin with, so that these issues could have been addressed as the were
added to the kernel.  I guess one possible source of comfort is that once
this is in place, future issues will make themselves immediately apparent.

Re: linux-next: Tree for Feb 4

On Thu, Feb 5, 2015 at 12:30 AM, Rafael J. Wysocki  wrote:
> On Wednesday, February 04, 2015 11:46:32 PM Sedat Dilek wrote:
>> On Wed, Feb 4, 2015 at 10:54 PM, Rafael J. Wysocki  
>> wrote:
>> > On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote:
>> >> On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell  
>> >> wrote:
>> >> > Hi all,
>> >> >
>> >> > The next release I will be making will be next-20150209 - which will
>> >> > probably be after the v3.19 release.
>> >> >
>> >> > Changes since 20150203:
>> >> >
>> >> > The sound-asoc tree gained a conflict against the sound tree.
>> >> >
>> >> > The scsi tree gained a build failure caused by an interaction with the
>> >> > driver-core tree.  I applied a merge fix patch.
>> >> >
>> >> > The akpm-current tree gained a build failure for which I disabled
>> >> > CONFIG_KASAN.
>> >> >
>> >> > Non-merge commits (relative to Linus' tree): 7461
>> >> >  7314 files changed, 309736 insertions(+), 172363 deletions(-)
>> >> >
>> >> > 
>> >> >
>> >>
>> >> [ CC linux-rcu | linux-pm | intel_pstate maintainers ]
>> >
>> > Dirk is not the maintainer of intel_pstate any more, CC: Kristen.
>> >
>> >> Hi,
>> >>
>> >> after suspend-and-resume I see the following call-trace:
>> >
>> > Do you see that after CPU1 offline too?
>> >
>>
>> NO.
>>
>> After...
>>
>> root# echo 0 > /sys/devices/system/cpu/cpu1/online
>>
>> ...I see this:
>>
>> +[  707.936668] PM: Saving platform NVS memory
>> +[  707.936674] Disabling non-boot CPUs ...
>> +[  707.936712] intel_pstate CPU 2 exiting
>> +[  707.938024] smpboot: CPU 2 didn't die...
>> +[  707.949128] intel_pstate CPU 3 exiting
>> +[  707.950369] smpboot: CPU 3 didn't die...
>> +[  707.966248] ACPI: Low-level resume complete
>> +[  707.966302] PM: Restoring platform NVS memory
>>
>> Full dmesg attached.
>
> The dmesg doesn't match what you said above.
>
> Anyway, that's not what I meant.  Does the CPU1 offlining alone:
>
> # echo 0 > /sys/devices/system/cpu/cpu1/online
>
> trigger the trace?  It should.
>

YES, I see this...
...
[   84.668616] PPP BSD Compression module registered
[   84.678072] PPP Deflate Compression module registered
[  101.143582] intel_pstate CPU 1 exiting
[  101.157134]
[  101.157135] ===
[  101.157136] [ INFO: suspicious RCU usage. ]
[  101.157139] 3.19.0-rc7-next-20150204.3-iniza-small #1 Not tainted
[  101.157140] ---
[  101.157142] include/trace/events/tlb.h:35 suspicious
rcu_dereference_check() usage!
[  101.157142]
[  101.157142] other info that might help us debug this:
[  101.157142]
[  101.157143]
[  101.157143] RCU used illegally from offline CPU!
[  101.157143] rcu_scheduler_active = 1, debug_locks = 0
[  101.157144] no locks held by swapper/1/0.
[  101.157144]
[  101.157144] stack backtrace:
[  101.157146] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
3.19.0-rc7-next-20150204.3-iniza-small #1
[  101.157147] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
[  101.157151]  0001 88011a44fe18 817e35fd
0011
[  101.157153]  88011a448290 88011a44fe48 810d6847
8800d3b96100
[  101.157155]  0001 88011a44c000 0005
88011a44fe78
[  101.157156] Call Trace:
[  101.157162]  [] dump_stack+0x4c/0x65
[  101.157166]  [] lockdep_rcu_suspicious+0xe7/0x120
[  101.157170]  [] idle_task_exit+0x205/0x2c0
[  101.157173]  [] play_dead_common+0xe/0x50
[  101.157175]  [] native_play_dead+0x15/0x140
[  101.157179]  [] arch_cpu_idle_dead+0xf/0x20
[  101.157181]  [] cpu_startup_entry+0x37e/0x580
[  101.157183]  [] start_secondary+0x140/0x150
[  101.157228] smpboot: CPU 1 is now offline

- Sedat -
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Wednesday, February 04, 2015 11:46:32 PM Sedat Dilek wrote:
> On Wed, Feb 4, 2015 at 10:54 PM, Rafael J. Wysocki  wrote:
> > On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote:
> >> On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell  
> >> wrote:
> >> > Hi all,
> >> >
> >> > The next release I will be making will be next-20150209 - which will
> >> > probably be after the v3.19 release.
> >> >
> >> > Changes since 20150203:
> >> >
> >> > The sound-asoc tree gained a conflict against the sound tree.
> >> >
> >> > The scsi tree gained a build failure caused by an interaction with the
> >> > driver-core tree.  I applied a merge fix patch.
> >> >
> >> > The akpm-current tree gained a build failure for which I disabled
> >> > CONFIG_KASAN.
> >> >
> >> > Non-merge commits (relative to Linus' tree): 7461
> >> >  7314 files changed, 309736 insertions(+), 172363 deletions(-)
> >> >
> >> > 
> >> >
> >>
> >> [ CC linux-rcu | linux-pm | intel_pstate maintainers ]
> >
> > Dirk is not the maintainer of intel_pstate any more, CC: Kristen.
> >
> >> Hi,
> >>
> >> after suspend-and-resume I see the following call-trace:
> >
> > Do you see that after CPU1 offline too?
> >
> 
> NO.
> 
> After...
> 
> root# echo 0 > /sys/devices/system/cpu/cpu1/online
> 
> ...I see this:
> 
> +[  707.936668] PM: Saving platform NVS memory
> +[  707.936674] Disabling non-boot CPUs ...
> +[  707.936712] intel_pstate CPU 2 exiting
> +[  707.938024] smpboot: CPU 2 didn't die...
> +[  707.949128] intel_pstate CPU 3 exiting
> +[  707.950369] smpboot: CPU 3 didn't die...
> +[  707.966248] ACPI: Low-level resume complete
> +[  707.966302] PM: Restoring platform NVS memory
> 
> Full dmesg attached.

The dmesg doesn't match what you said above.

Anyway, that's not what I meant.  Does the CPU1 offlining alone:

# echo 0 > /sys/devices/system/cpu/cpu1/online

trigger the trace?  It should.


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Wednesday, February 04, 2015 11:38:40 PM Sedat Dilek wrote:
> On Wed, Feb 4, 2015 at 10:54 PM, Rafael J. Wysocki  wrote:
> > On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote:
> >> On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell  
> >> wrote:
> >> > Hi all,
> >> >
> >> > The next release I will be making will be next-20150209 - which will
> >> > probably be after the v3.19 release.
> >> >
> >> > Changes since 20150203:
> >> >
> >> > The sound-asoc tree gained a conflict against the sound tree.
> >> >
> >> > The scsi tree gained a build failure caused by an interaction with the
> >> > driver-core tree.  I applied a merge fix patch.
> >> >
> >> > The akpm-current tree gained a build failure for which I disabled
> >> > CONFIG_KASAN.
> >> >
> >> > Non-merge commits (relative to Linus' tree): 7461
> >> >  7314 files changed, 309736 insertions(+), 172363 deletions(-)
> >> >
> >> > 
> >> >
> >>
> >> [ CC linux-rcu | linux-pm | intel_pstate maintainers ]
> >
> > Dirk is not the maintainer of intel_pstate any more, CC: Kristen.
> >
> 
> Yupp, I forwarded my original posting before you answered me.
> 
> >> Hi,
> >>
> >> after suspend-and-resume I see the following call-trace:
> >
> > Do you see that after CPU1 offline too?
> >
> 
> Did not check yet.
> 
> >> ...
> >> [ 1144.482666] Disabling non-boot CPUs ...
> >> [ 1144.483000] intel_pstate CPU 1 exiting
> >> [ 1144.486064]
> >> [ 1144.486065] ===
> >> [ 1144.486067] smpboot: CPU 1 didn't die...
> >> [ 1144.486067] [ INFO: suspicious RCU usage. ]
> >> [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted
> >> [ 1144.486070] ---
> >> [ 1144.486072] include/trace/events/tlb.h:35 suspicious
> >> rcu_dereference_check() usage!
> >> [ 1144.486073]
> >> [ 1144.486073] other info that might help us debug this:
> >> [ 1144.486073]
> >> [ 1144.486074]
> >> [ 1144.486074] RCU used illegally from offline CPU!
> >> [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0
> >> [ 1144.486076] no locks held by swapper/1/0.
> >> [ 1144.486076]
> >> [ 1144.486076] stack backtrace:
> >> [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
> >> 3.19.0-rc7-next-20150204.1-iniza-small #1
> >> [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
> >> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
> >> [ 1144.486085]  0001 88011a44fe18 817e370d
> >> 0011
> >> [ 1144.486088]  88011a448290 88011a44fe48 810d6847
> >> 8800c66b9600
> >> [ 1144.486091]  0001 88011a44c000 81cb3900
> >> 88011a44fe78
> >> [ 1144.486092] Call Trace:
> >> [ 1144.486099]  [] dump_stack+0x4c/0x65
> >> [ 1144.486104]  [] lockdep_rcu_suspicious+0xe7/0x120
> >> [ 1144.486109]  [] idle_task_exit+0x205/0x2c0
> >> [ 1144.486113]  [] play_dead_common+0xe/0x50
> >> [ 1144.486116]  [] native_play_dead+0x15/0x140
> >> [ 1144.486121]  [] arch_cpu_idle_dead+0xf/0x20
> >> [ 1144.486123]  [] cpu_startup_entry+0x37e/0x580
> >> [ 1144.486126]  [] start_secondary+0x140/0x150
> >> [ 1144.502920] intel_pstate CPU 2 exiting
> >> ...
> >>
> >> Not sure if this comes from the rcu or pm/intel_pstate area.
> >
> > New intel_pstate commits in linux-next are between 7ab0256e57ae and
> > a04759924e25 inclusive.  Please check that range first.
> >
> 
> Not sure if I am willing to test with reverted patches.
> ( /me was updating Linux graphic driver stack today built with
> upcomming llvm-toolchain v3.6.0. )
> 
> > If that doesn't point you to the offender, you can pull the linux-next
> > branch of the linux-pm.git tree at:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
> >
> > and see if that alone triggers the issue for you.  If not, the offender is
> > not there.  Otherwise, and if you use the ACPI cpuidle driver, you can
> > check the acpi-processor merge point too.
> >
> 
> I pulled in pm-next-20150204 on top of next-20150204, but that did not help.

What I was asking about was to test linux-pm.git/linux-next *instead* *of* full
linux-next and not on top of it.  That would tell you whether or not the new 
trace
was introduced by one of the PM commits or elsewhere.

But this most likely is what Paul said anyway.


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Wed, Feb 4, 2015 at 10:54 PM, Rafael J. Wysocki  wrote:
> On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote:
>> On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell  
>> wrote:
>> > Hi all,
>> >
>> > The next release I will be making will be next-20150209 - which will
>> > probably be after the v3.19 release.
>> >
>> > Changes since 20150203:
>> >
>> > The sound-asoc tree gained a conflict against the sound tree.
>> >
>> > The scsi tree gained a build failure caused by an interaction with the
>> > driver-core tree.  I applied a merge fix patch.
>> >
>> > The akpm-current tree gained a build failure for which I disabled
>> > CONFIG_KASAN.
>> >
>> > Non-merge commits (relative to Linus' tree): 7461
>> >  7314 files changed, 309736 insertions(+), 172363 deletions(-)
>> >
>> > 
>> >
>>
>> [ CC linux-rcu | linux-pm | intel_pstate maintainers ]
>
> Dirk is not the maintainer of intel_pstate any more, CC: Kristen.
>
>> Hi,
>>
>> after suspend-and-resume I see the following call-trace:
>
> Do you see that after CPU1 offline too?
>

NO.

After...

root# echo 0 > /sys/devices/system/cpu/cpu1/online

...I see this:

+[  707.936668] PM: Saving platform NVS memory
+[  707.936674] Disabling non-boot CPUs ...
+[  707.936712] intel_pstate CPU 2 exiting
+[  707.938024] smpboot: CPU 2 didn't die...
+[  707.949128] intel_pstate CPU 3 exiting
+[  707.950369] smpboot: CPU 3 didn't die...
+[  707.966248] ACPI: Low-level resume complete
+[  707.966302] PM: Restoring platform NVS memory

Full dmesg attached.

I hope this helps.

- Sedat -

>> ...
>> [ 1144.482666] Disabling non-boot CPUs ...
>> [ 1144.483000] intel_pstate CPU 1 exiting
>> [ 1144.486064]
>> [ 1144.486065] ===
>> [ 1144.486067] smpboot: CPU 1 didn't die...
>> [ 1144.486067] [ INFO: suspicious RCU usage. ]
>> [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted
>> [ 1144.486070] ---
>> [ 1144.486072] include/trace/events/tlb.h:35 suspicious
>> rcu_dereference_check() usage!
>> [ 1144.486073]
>> [ 1144.486073] other info that might help us debug this:
>> [ 1144.486073]
>> [ 1144.486074]
>> [ 1144.486074] RCU used illegally from offline CPU!
>> [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0
>> [ 1144.486076] no locks held by swapper/1/0.
>> [ 1144.486076]
>> [ 1144.486076] stack backtrace:
>> [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
>> 3.19.0-rc7-next-20150204.1-iniza-small #1
>> [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
>> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
>> [ 1144.486085]  0001 88011a44fe18 817e370d
>> 0011
>> [ 1144.486088]  88011a448290 88011a44fe48 810d6847
>> 8800c66b9600
>> [ 1144.486091]  0001 88011a44c000 81cb3900
>> 88011a44fe78
>> [ 1144.486092] Call Trace:
>> [ 1144.486099]  [] dump_stack+0x4c/0x65
>> [ 1144.486104]  [] lockdep_rcu_suspicious+0xe7/0x120
>> [ 1144.486109]  [] idle_task_exit+0x205/0x2c0
>> [ 1144.486113]  [] play_dead_common+0xe/0x50
>> [ 1144.486116]  [] native_play_dead+0x15/0x140
>> [ 1144.486121]  [] arch_cpu_idle_dead+0xf/0x20
>> [ 1144.486123]  [] cpu_startup_entry+0x37e/0x580
>> [ 1144.486126]  [] start_secondary+0x140/0x150
>> [ 1144.502920] intel_pstate CPU 2 exiting
>> ...
>>
>> Not sure if this comes from the rcu or pm/intel_pstate area.
>
> New intel_pstate commits in linux-next are between 7ab0256e57ae and
> a04759924e25 inclusive.  Please check that range first.
>
> If that doesn't point you to the offender, you can pull the linux-next
> branch of the linux-pm.git tree at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
>
> and see if that alone triggers the issue for you.  If not, the offender is
> not there.  Otherwise, and if you use the ACPI cpuidle driver, you can
> check the acpi-processor merge point too.
>
>
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
[0.00] Initializing cgroup subsys cpuset
[0.00] Initializing cgroup subsys cpu
[0.00] Initializing cgroup subsys cpuacct
[0.00] Linux version 3.19.0-rc7-next-20150204.2-iniza-small 
(sedat.di...@gmail.com@fambox) (gcc version 4.9.2 (Ubuntu 4.9.2-0ubuntu1~12.04) 
) #1 SMP Wed Feb 4 23:25:30 CET 2015
[0.00] Command line: 
BOOT_IMAGE=/boot/vmlinuz-3.19.0-rc7-next-20150204.2-iniza-small 
root=UUID=001AADA61AAD9964 loop=/ubuntu/disks/root.disk ro
[0.00] KERNEL supported cpus:
[0.00]   Intel GenuineIntel
[0.00]   AMD AuthenticAMD
[0.00]   Centaur CentaurHauls
[0.00] Disabled fast string operations
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009d7ff] usable
[0.00] BIOS-e820: [mem 0x0009d800-0x0009] reserved
[

Re: linux-next: Tree for Feb 4

On Wed, Feb 4, 2015 at 10:54 PM, Rafael J. Wysocki  wrote:
> On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote:
>> On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell  
>> wrote:
>> > Hi all,
>> >
>> > The next release I will be making will be next-20150209 - which will
>> > probably be after the v3.19 release.
>> >
>> > Changes since 20150203:
>> >
>> > The sound-asoc tree gained a conflict against the sound tree.
>> >
>> > The scsi tree gained a build failure caused by an interaction with the
>> > driver-core tree.  I applied a merge fix patch.
>> >
>> > The akpm-current tree gained a build failure for which I disabled
>> > CONFIG_KASAN.
>> >
>> > Non-merge commits (relative to Linus' tree): 7461
>> >  7314 files changed, 309736 insertions(+), 172363 deletions(-)
>> >
>> > 
>> >
>>
>> [ CC linux-rcu | linux-pm | intel_pstate maintainers ]
>
> Dirk is not the maintainer of intel_pstate any more, CC: Kristen.
>

Yupp, I forwarded my original posting before you answered me.

>> Hi,
>>
>> after suspend-and-resume I see the following call-trace:
>
> Do you see that after CPU1 offline too?
>

Did not check yet.

>> ...
>> [ 1144.482666] Disabling non-boot CPUs ...
>> [ 1144.483000] intel_pstate CPU 1 exiting
>> [ 1144.486064]
>> [ 1144.486065] ===
>> [ 1144.486067] smpboot: CPU 1 didn't die...
>> [ 1144.486067] [ INFO: suspicious RCU usage. ]
>> [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted
>> [ 1144.486070] ---
>> [ 1144.486072] include/trace/events/tlb.h:35 suspicious
>> rcu_dereference_check() usage!
>> [ 1144.486073]
>> [ 1144.486073] other info that might help us debug this:
>> [ 1144.486073]
>> [ 1144.486074]
>> [ 1144.486074] RCU used illegally from offline CPU!
>> [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0
>> [ 1144.486076] no locks held by swapper/1/0.
>> [ 1144.486076]
>> [ 1144.486076] stack backtrace:
>> [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
>> 3.19.0-rc7-next-20150204.1-iniza-small #1
>> [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
>> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
>> [ 1144.486085]  0001 88011a44fe18 817e370d
>> 0011
>> [ 1144.486088]  88011a448290 88011a44fe48 810d6847
>> 8800c66b9600
>> [ 1144.486091]  0001 88011a44c000 81cb3900
>> 88011a44fe78
>> [ 1144.486092] Call Trace:
>> [ 1144.486099]  [] dump_stack+0x4c/0x65
>> [ 1144.486104]  [] lockdep_rcu_suspicious+0xe7/0x120
>> [ 1144.486109]  [] idle_task_exit+0x205/0x2c0
>> [ 1144.486113]  [] play_dead_common+0xe/0x50
>> [ 1144.486116]  [] native_play_dead+0x15/0x140
>> [ 1144.486121]  [] arch_cpu_idle_dead+0xf/0x20
>> [ 1144.486123]  [] cpu_startup_entry+0x37e/0x580
>> [ 1144.486126]  [] start_secondary+0x140/0x150
>> [ 1144.502920] intel_pstate CPU 2 exiting
>> ...
>>
>> Not sure if this comes from the rcu or pm/intel_pstate area.
>
> New intel_pstate commits in linux-next are between 7ab0256e57ae and
> a04759924e25 inclusive.  Please check that range first.
>

Not sure if I am willing to test with reverted patches.
( /me was updating Linux graphic driver stack today built with
upcomming llvm-toolchain v3.6.0. )

> If that doesn't point you to the offender, you can pull the linux-next
> branch of the linux-pm.git tree at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
>
> and see if that alone triggers the issue for you.  If not, the offender is
> not there.  Otherwise, and if you use the ACPI cpuidle driver, you can
> check the acpi-processor merge point too.
>

I pulled in pm-next-20150204 on top of next-20150204, but that did not help.

- Sedat -
Jiang Liu (8):
  ACPI: Fix a bug in parsing ACPI Memory24 resource
  ACPI: Normalize return value of resource parser functions
  ACPI: Set flag IORESOURCE_UNSET for unassigned resources
  ACPI: Enforce stricter checks for address space descriptors
  ACPI: Return translation offset when parsing ACPI address space resources
  ACPI: Translate resource into master side address for bridge window resources
  ACPI: Add field offset to struct resource_list_entry
  ACPI: Introduce helper function acpi_dev_filter_resource_type()

Markus Elfring (1):
  cpufreq-dt: Drop unnecessary check before cpufreq_cooling_unregister() invocation

Rafael J. Wysocki (14):
  ACPI / cpuidle: Drop unnecessary calls from acpi_idle_do_entry()
  ACPI / cpuidle: Drop unnecessary calls from ->enter callback routines
  ACPI / cpuidle: Clean up fallback to C1 checks
  ACPI / cpuidle: Drop irrelevant comment from acpi_idle_enter_simple()
  ACPI / cpuidle: Clean up white space in a switch statement
  ACPI / cpuidle: Drop flags.bm_check tests from acpi_idle_enter_bm()
  ACPI / cpuidle: Merge acpi_idle_enter_c1()

Re: linux-next: Tree for Feb 4

On Wednesday, February 04, 2015 01:53:58 PM Paul E. McKenney wrote:
> On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki wrote:
> > On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote:
> > > On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell  
> > > wrote:
> > > > Hi all,
> > > >
> > > > The next release I will be making will be next-20150209 - which will
> > > > probably be after the v3.19 release.
> > > >
> > > > Changes since 20150203:
> > > >
> > > > The sound-asoc tree gained a conflict against the sound tree.
> > > >
> > > > The scsi tree gained a build failure caused by an interaction with the
> > > > driver-core tree.  I applied a merge fix patch.
> > > >
> > > > The akpm-current tree gained a build failure for which I disabled
> > > > CONFIG_KASAN.
> > > >
> > > > Non-merge commits (relative to Linus' tree): 7461
> > > >  7314 files changed, 309736 insertions(+), 172363 deletions(-)
> > > >
> > > > 
> > > >
> > > 
> > > [ CC linux-rcu | linux-pm | intel_pstate maintainers ]
> > 
> > Dirk is not the maintainer of intel_pstate any more, CC: Kristen.
> > 
> > > Hi,
> > > 
> > > after suspend-and-resume I see the following call-trace:
> > 
> > Do you see that after CPU1 offline too? 
> > 
> > > ...
> > > [ 1144.482666] Disabling non-boot CPUs ...
> > > [ 1144.483000] intel_pstate CPU 1 exiting
> > > [ 1144.486064]
> > > [ 1144.486065] ===
> > > [ 1144.486067] smpboot: CPU 1 didn't die...
> > > [ 1144.486067] [ INFO: suspicious RCU usage. ]
> > > [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted
> > > [ 1144.486070] ---
> > > [ 1144.486072] include/trace/events/tlb.h:35 suspicious
> > > rcu_dereference_check() usage!
> > > [ 1144.486073]
> > > [ 1144.486073] other info that might help us debug this:
> > > [ 1144.486073]
> > > [ 1144.486074]
> > > [ 1144.486074] RCU used illegally from offline CPU!
> > > [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0
> > > [ 1144.486076] no locks held by swapper/1/0.
> > > [ 1144.486076]
> > > [ 1144.486076] stack backtrace:
> > > [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
> > > 3.19.0-rc7-next-20150204.1-iniza-small #1
> > > [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
> > > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
> > > [ 1144.486085]  0001 88011a44fe18 817e370d
> > > 0011
> > > [ 1144.486088]  88011a448290 88011a44fe48 810d6847
> > > 8800c66b9600
> > > [ 1144.486091]  0001 88011a44c000 81cb3900
> > > 88011a44fe78
> > > [ 1144.486092] Call Trace:
> > > [ 1144.486099]  [] dump_stack+0x4c/0x65
> > > [ 1144.486104]  [] lockdep_rcu_suspicious+0xe7/0x120
> 
> As near as I can tell, idle_task_exit() is running on an offline CPU,
> then calling switch_mm() which contains trace_tlb_flush(), which uses RCU.
> And RCU is objecting to being used from a CPU that it is ignoring.
> 
> One approach would be to push RCU's idea of when the CPU goes offline
> down into arch code in this case, using some Kconfig symbol and
> the usual conditional compilation.  Another approach would be to
> invoke the trace calls under cpu_online(), for example, for the
> first such call in switch_mm():
> 
>   if (cpu_online(smp_processor_id()))
>   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
> 
> The compiler would discard this if tracing was disabled.

That looks like less intrusive to me.

> Other thoughts?

Well, the whole issue here seems to be that common code using RCU is also
useful in places where RCU doesn't want to be used.  Arguably, we can deal
with all of those cases in a whack-a-mole manner, but that doesn't seem to
scale too well.

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki wrote:
> On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote:
> > On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell  
> > wrote:
> > > Hi all,
> > >
> > > The next release I will be making will be next-20150209 - which will
> > > probably be after the v3.19 release.
> > >
> > > Changes since 20150203:
> > >
> > > The sound-asoc tree gained a conflict against the sound tree.
> > >
> > > The scsi tree gained a build failure caused by an interaction with the
> > > driver-core tree.  I applied a merge fix patch.
> > >
> > > The akpm-current tree gained a build failure for which I disabled
> > > CONFIG_KASAN.
> > >
> > > Non-merge commits (relative to Linus' tree): 7461
> > >  7314 files changed, 309736 insertions(+), 172363 deletions(-)
> > >
> > > 
> > >
> > 
> > [ CC linux-rcu | linux-pm | intel_pstate maintainers ]
> 
> Dirk is not the maintainer of intel_pstate any more, CC: Kristen.
> 
> > Hi,
> > 
> > after suspend-and-resume I see the following call-trace:
> 
> Do you see that after CPU1 offline too? 
> 
> > ...
> > [ 1144.482666] Disabling non-boot CPUs ...
> > [ 1144.483000] intel_pstate CPU 1 exiting
> > [ 1144.486064]
> > [ 1144.486065] ===
> > [ 1144.486067] smpboot: CPU 1 didn't die...
> > [ 1144.486067] [ INFO: suspicious RCU usage. ]
> > [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted
> > [ 1144.486070] ---
> > [ 1144.486072] include/trace/events/tlb.h:35 suspicious
> > rcu_dereference_check() usage!
> > [ 1144.486073]
> > [ 1144.486073] other info that might help us debug this:
> > [ 1144.486073]
> > [ 1144.486074]
> > [ 1144.486074] RCU used illegally from offline CPU!
> > [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0
> > [ 1144.486076] no locks held by swapper/1/0.
> > [ 1144.486076]
> > [ 1144.486076] stack backtrace:
> > [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
> > 3.19.0-rc7-next-20150204.1-iniza-small #1
> > [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
> > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
> > [ 1144.486085]  0001 88011a44fe18 817e370d
> > 0011
> > [ 1144.486088]  88011a448290 88011a44fe48 810d6847
> > 8800c66b9600
> > [ 1144.486091]  0001 88011a44c000 81cb3900
> > 88011a44fe78
> > [ 1144.486092] Call Trace:
> > [ 1144.486099]  [] dump_stack+0x4c/0x65
> > [ 1144.486104]  [] lockdep_rcu_suspicious+0xe7/0x120

As near as I can tell, idle_task_exit() is running on an offline CPU,
then calling switch_mm() which contains trace_tlb_flush(), which uses RCU.
And RCU is objecting to being used from a CPU that it is ignoring.

One approach would be to push RCU's idea of when the CPU goes offline
down into arch code in this case, using some Kconfig symbol and
the usual conditional compilation.  Another approach would be to
invoke the trace calls under cpu_online(), for example, for the
first such call in switch_mm():

if (cpu_online(smp_processor_id()))
trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);

The compiler would discard this if tracing was disabled.

Other thoughts?

Note that this use of RCU from an offline CPU is currently tolerated,
but is vulnerable to delays, for example, due to virtualization.
If a CPU takes more than one jiffy to get from _stop_machine() state
to fully offlined, life can be very hard.

> > [ 1144.486109]  [] idle_task_exit+0x205/0x2c0
> > [ 1144.486113]  [] play_dead_common+0xe/0x50
> > [ 1144.486116]  [] native_play_dead+0x15/0x140
> > [ 1144.486121]  [] arch_cpu_idle_dead+0xf/0x20
> > [ 1144.486123]  [] cpu_startup_entry+0x37e/0x580
> > [ 1144.486126]  [] start_secondary+0x140/0x150
> > [ 1144.502920] intel_pstate CPU 2 exiting
> > ...
> > 
> > Not sure if this comes from the rcu or pm/intel_pstate area.
> 
> New intel_pstate commits in linux-next are between 7ab0256e57ae and
> a04759924e25 inclusive.  Please check that range first.
> 
> If that doesn't point you to the offender, you can pull the linux-next
> branch of the linux-pm.git tree at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
> 
> and see if that alone triggers the issue for you.  If not, the offender is
> not there.  Otherwise, and if you use the ACPI cpuidle driver, you can
> check the acpi-processor merge point too.

This is almost certainly RCU getting more strict about CPUs using RCU
while offline.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote:
> On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell  
> wrote:
> > Hi all,
> >
> > The next release I will be making will be next-20150209 - which will
> > probably be after the v3.19 release.
> >
> > Changes since 20150203:
> >
> > The sound-asoc tree gained a conflict against the sound tree.
> >
> > The scsi tree gained a build failure caused by an interaction with the
> > driver-core tree.  I applied a merge fix patch.
> >
> > The akpm-current tree gained a build failure for which I disabled
> > CONFIG_KASAN.
> >
> > Non-merge commits (relative to Linus' tree): 7461
> >  7314 files changed, 309736 insertions(+), 172363 deletions(-)
> >
> > 
> >
> 
> [ CC linux-rcu | linux-pm | intel_pstate maintainers ]

Dirk is not the maintainer of intel_pstate any more, CC: Kristen.

> Hi,
> 
> after suspend-and-resume I see the following call-trace:

Do you see that after CPU1 offline too? 

> ...
> [ 1144.482666] Disabling non-boot CPUs ...
> [ 1144.483000] intel_pstate CPU 1 exiting
> [ 1144.486064]
> [ 1144.486065] ===
> [ 1144.486067] smpboot: CPU 1 didn't die...
> [ 1144.486067] [ INFO: suspicious RCU usage. ]
> [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted
> [ 1144.486070] ---
> [ 1144.486072] include/trace/events/tlb.h:35 suspicious
> rcu_dereference_check() usage!
> [ 1144.486073]
> [ 1144.486073] other info that might help us debug this:
> [ 1144.486073]
> [ 1144.486074]
> [ 1144.486074] RCU used illegally from offline CPU!
> [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0
> [ 1144.486076] no locks held by swapper/1/0.
> [ 1144.486076]
> [ 1144.486076] stack backtrace:
> [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
> 3.19.0-rc7-next-20150204.1-iniza-small #1
> [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
> [ 1144.486085]  0001 88011a44fe18 817e370d
> 0011
> [ 1144.486088]  88011a448290 88011a44fe48 810d6847
> 8800c66b9600
> [ 1144.486091]  0001 88011a44c000 81cb3900
> 88011a44fe78
> [ 1144.486092] Call Trace:
> [ 1144.486099]  [] dump_stack+0x4c/0x65
> [ 1144.486104]  [] lockdep_rcu_suspicious+0xe7/0x120
> [ 1144.486109]  [] idle_task_exit+0x205/0x2c0
> [ 1144.486113]  [] play_dead_common+0xe/0x50
> [ 1144.486116]  [] native_play_dead+0x15/0x140
> [ 1144.486121]  [] arch_cpu_idle_dead+0xf/0x20
> [ 1144.486123]  [] cpu_startup_entry+0x37e/0x580
> [ 1144.486126]  [] start_secondary+0x140/0x150
> [ 1144.502920] intel_pstate CPU 2 exiting
> ...
> 
> Not sure if this comes from the rcu or pm/intel_pstate area.

New intel_pstate commits in linux-next are between 7ab0256e57ae and
a04759924e25 inclusive.  Please check that range first.

If that doesn't point you to the offender, you can pull the linux-next
branch of the linux-pm.git tree at:

git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next

and see if that alone triggers the issue for you.  If not, the offender is
not there.  Otherwise, and if you use the ACPI cpuidle driver, you can
check the acpi-processor merge point too.


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Wed, Feb 4, 2015 at 4:58 PM, Martin K. Petersen
 wrote:
>> "Sedat" == Sedat Dilek  writes:
>
>> I am seeing the following in my logs several times...
>>
>> Feb 4 02:53:13 fambox kernel: [15507.397482] blk_update_request:
>> I/O error, dev loop0, sector 21261344 Feb 4 02:53:13 fambox
>> kernel: [15507.397531] loop0: DISCARD failed.  Manually zeroing.
>
> Sedat> What's the plan... s/pr_warn/pr_debug ?
>
> The rationale here is that we'd like to log (once) if discard or write
> same fail on a given device.
>
> In SCSI we disable these commands if they get failed by the storage. But
> it looks like loop keeps advertising discard support after a failure.
>
> Is your loop device encrypted? Do you know why the discard is failing?
>

No, but I am here on a so-called WUBI installation which triggered
some bugs being an exotic installation.
My Ubuntu/precise is a 18GiB image laying on my Win7 partition (/dev/sda2).

How can I check or debug the discard failing?

- Sedat -

P.S.: Some diagnostics

$ LC_ALL=C df -T
Filesystem Type 1K-blocks  Used Available Use% Mounted on
rootfs rootfs17753424  15663216   1165332  94% /
udev   devtmpfs   1959324 4   1959320   1% /dev
tmpfs  tmpfs   393888   904392984   1% /run
/dev/sda2  fuseblk  465546236 161295260 304250976  35% /host
/dev/loop0 ext4  17753424  15663216   1165332  94% /
none   tmpfs 5120 4  5116   1% /run/lock
none   tmpfs  1969428   176   1969252   1% /run/shm

$ cat /etc/fstab
# /etc/fstab: static file system information.

# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).

#   
   
proc/proc   proc
nodev,noexec,nosuid 0   0
/host/ubuntu/disks/root.disk/   ext4
loop,errors=remount-ro  0   1
/host/ubuntu/disks/swap.disknoneswaploop,sw
 0   0

$ LC_ALL=C sudo fdisk -l /dev/sda
[sudo] password for wearefam:

Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0xcb9885ab

   Device Boot  Start End  Blocks   Id  System
/dev/sda1   *2048  206847  1024007  HPFS/NTFS/exFAT
/dev/sda2  206848   931299327   4655462407  HPFS/NTFS/exFAT
/dev/sda3   931299328   97677311922736896   27  Hidden NTFS WinRE

$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-3.13.0-45-generic root=UUID=001AADA61AAD9964
loop=/ubuntu/disks/root.disk ro
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

2015-02-04 Thread Martin K. Petersen

> "Sedat" == Sedat Dilek  writes:

> I am seeing the following in my logs several times...
> 
> Feb 4 02:53:13 fambox kernel: [15507.397482] blk_update_request:
> I/O error, dev loop0, sector 21261344 Feb 4 02:53:13 fambox
> kernel: [15507.397531] loop0: DISCARD failed.  Manually zeroing.

Sedat> What's the plan... s/pr_warn/pr_debug ?

The rationale here is that we'd like to log (once) if discard or write
same fail on a given device.

In SCSI we disable these commands if they get failed by the storage. But
it looks like loop keeps advertising discard support after a failure.

Is your loop device encrypted? Do you know why the discard is failing?

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Wed, Feb 4, 2015 at 4:31 PM, Jens Axboe  wrote:
> On 02/04/2015 08:21 AM, Sedat Dilek wrote:
>>
>> On Wed, Feb 4, 2015 at 4:16 PM, Jens Axboe  wrote:
>>>
>>> On 02/04/2015 05:26 AM, Sedat Dilek wrote:


 On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell 
 wrote:
>
>
> Hi all,
>
> The next release I will be making will be next-20150209 - which will
> probably be after the v3.19 release.
>
> Changes since 20150203:
>
> The sound-asoc tree gained a conflict against the sound tree.
>
> The scsi tree gained a build failure caused by an interaction with the
> driver-core tree.  I applied a merge fix patch.
>
> The akpm-current tree gained a build failure for which I disabled
> CONFIG_KASAN.
>
> Non-merge commits (relative to Linus' tree): 7461
>7314 files changed, 309736 insertions(+), 172363 deletions(-)
>
>
>
> 
>

 [ CC Jens ]

 Hi,

 I am seeing the following in my logs several times...

 Feb  4 02:53:13 fambox kernel: [15507.397482] blk_update_request: I/O
 error, dev loop0, sector 21261344
 Feb  4 02:53:13 fambox kernel: [15507.397531] loop0: DISCARD failed.
 Manually zeroing.
>>>
>>>
>>>
>>> This is from Martin's commit (CC'ed). Martin, there are various ways we
>>> can
>>> end up "failing" from blkdev_issue_discard(), I'm going to kill those
>>> debug
>>> warnings.
>>>
>>
>> [ Really CC Martin :-) ]
>
>
> Ooops, thanks :-)
>
>> Caused by this one...?
>>
>> commit d93ba7a5a97c9f315bacdcdb8de4e5f368e7b396
>> "block: Add discard flag to blkdev_issue_zeroout() function"
>
>
> That's the one.
>

What's the plan... s/pr_warn/pr_debug ?

- Sedat -
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

2015-02-04 Thread Jens Axboe


On 02/04/2015 08:21 AM, Sedat Dilek wrote:

On Wed, Feb 4, 2015 at 4:16 PM, Jens Axboe  wrote:

On 02/04/2015 05:26 AM, Sedat Dilek wrote:


On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell 
wrote:


Hi all,

The next release I will be making will be next-20150209 - which will
probably be after the v3.19 release.

Changes since 20150203:

The sound-asoc tree gained a conflict against the sound tree.

The scsi tree gained a build failure caused by an interaction with the
driver-core tree.  I applied a merge fix patch.

The akpm-current tree gained a build failure for which I disabled
CONFIG_KASAN.

Non-merge commits (relative to Linus' tree): 7461
   7314 files changed, 309736 insertions(+), 172363 deletions(-)






[ CC Jens ]

Hi,

I am seeing the following in my logs several times...

Feb  4 02:53:13 fambox kernel: [15507.397482] blk_update_request: I/O
error, dev loop0, sector 21261344
Feb  4 02:53:13 fambox kernel: [15507.397531] loop0: DISCARD failed.
Manually zeroing.



This is from Martin's commit (CC'ed). Martin, there are various ways we can
end up "failing" from blkdev_issue_discard(), I'm going to kill those debug
warnings.



[ Really CC Martin :-) ]


Ooops, thanks :-)


Caused by this one...?

commit d93ba7a5a97c9f315bacdcdb8de4e5f368e7b396
"block: Add discard flag to blkdev_issue_zeroout() function"


That's the one.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Wed, Feb 4, 2015 at 4:16 PM, Jens Axboe  wrote:
> On 02/04/2015 05:26 AM, Sedat Dilek wrote:
>>
>> On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell 
>> wrote:
>>>
>>> Hi all,
>>>
>>> The next release I will be making will be next-20150209 - which will
>>> probably be after the v3.19 release.
>>>
>>> Changes since 20150203:
>>>
>>> The sound-asoc tree gained a conflict against the sound tree.
>>>
>>> The scsi tree gained a build failure caused by an interaction with the
>>> driver-core tree.  I applied a merge fix patch.
>>>
>>> The akpm-current tree gained a build failure for which I disabled
>>> CONFIG_KASAN.
>>>
>>> Non-merge commits (relative to Linus' tree): 7461
>>>   7314 files changed, 309736 insertions(+), 172363 deletions(-)
>>>
>>>
>>> 
>>>
>>
>> [ CC Jens ]
>>
>> Hi,
>>
>> I am seeing the following in my logs several times...
>>
>> Feb  4 02:53:13 fambox kernel: [15507.397482] blk_update_request: I/O
>> error, dev loop0, sector 21261344
>> Feb  4 02:53:13 fambox kernel: [15507.397531] loop0: DISCARD failed.
>> Manually zeroing.
>
>
> This is from Martin's commit (CC'ed). Martin, there are various ways we can
> end up "failing" from blkdev_issue_discard(), I'm going to kill those debug
> warnings.
>

[ Really CC Martin :-) ]

Caused by this one...?

commit d93ba7a5a97c9f315bacdcdb8de4e5f368e7b396
"block: Add discard flag to blkdev_issue_zeroout() function"

- Sedat -

[1] 
http://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git/commit/?h=for-next=d93ba7a5a97c9f315bacdcdb8de4e5f368e7b396
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

2015-02-04 Thread Jens Axboe


On 02/04/2015 05:26 AM, Sedat Dilek wrote:

On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell  wrote:

Hi all,

The next release I will be making will be next-20150209 - which will
probably be after the v3.19 release.

Changes since 20150203:

The sound-asoc tree gained a conflict against the sound tree.

The scsi tree gained a build failure caused by an interaction with the
driver-core tree.  I applied a merge fix patch.

The akpm-current tree gained a build failure for which I disabled
CONFIG_KASAN.

Non-merge commits (relative to Linus' tree): 7461
  7314 files changed, 309736 insertions(+), 172363 deletions(-)





[ CC Jens ]

Hi,

I am seeing the following in my logs several times...

Feb  4 02:53:13 fambox kernel: [15507.397482] blk_update_request: I/O
error, dev loop0, sector 21261344
Feb  4 02:53:13 fambox kernel: [15507.397531] loop0: DISCARD failed.
Manually zeroing.


This is from Martin's commit (CC'ed). Martin, there are various ways we 
can end up "failing" from blkdev_issue_discard(), I'm going to kill 
those debug warnings.


--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: Tree for Feb 4

2015-02-04 Thread Stephen Rothwell

Hi all,

The next release I will be making will be next-20150209 - which will
probably be after the v3.19 release.

Changes since 20150203:

The sound-asoc tree gained a conflict against the sound tree.

The scsi tree gained a build failure caused by an interaction with the
driver-core tree.  I applied a merge fix patch.

The akpm-current tree gained a build failure for which I disabled
CONFIG_KASAN.

Non-merge commits (relative to Linus' tree): 7461
 7314 files changed, 309736 insertions(+), 172363 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log files
in the Next directory.  Between each merge, the tree was built with
a ppc64_defconfig for powerpc and an allmodconfig for x86_64 and a
multi_v7_defconfig for arm. After the final fixups (if any), it is also
built with powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig and
allyesconfig (this fails its final link) and i386, sparc, sparc64 and arm
defconfig.

Below is a summary of the state of the merge.

I am currently merging 206 trees (counting Linus' and 30 trees of patches
pending for Linus' tree).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

$ git checkout master
$ git reset --hard stable
Merging origin/master (0f98c38d725f Merge branch 'for-linus' of 
git://git.kernel.dk/linux-block)
Merging fixes/master (b94d525e58dc Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging kbuild-current/rc-fixes (a16c5f99a28c kbuild: Fix removal of the 
debian/ directory)
Merging arc-current/for-curr (2ce7598c9a45 Linux 3.17-rc4)
Merging arm-current/fixes (8e6480667246 ARM: 8299/1: mm: ensure local active 
ASID is marked as allocated on rollover)
CONFLICT (content): Merge conflict in arch/arm/mm/dma-mapping.c
Merging m68k-current/for-linus (f27bd5bfeda5 m68k: Wire up execveat)
Merging metag-fixes/fixes (ffe6902b66aa asm-generic: remove _STK_LIM_MAX)
Merging mips-fixes/mips-fixes (1795cd9b3a91 Linux 3.16-rc5)
Merging powerpc-merge/merge (31345e1a071e powerpc/pci: Remove unused 
force_32bit_msi quirk)
Merging powerpc-merge-mpe/fixes (c59c961ca511 Merge branch 'drm-fixes' of 
git://people.freedesktop.org/~airlied/linux)
Merging sparc/master (66d0f7ec9f10 sparc32: destroy_context() and switch_mm() 
needs to disable interrupts.)
Merging net/master (42b5212fee4f xen-netback: stop the guest rx thread after a 
fatal error)
Merging ipsec/master (59343cd7c480 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging sound-current/for-linus (4161b4505f16 ALSA: ak411x: Fix stall in work 
callback)
Merging pci-current/for-linus (51ac3d2f0c50 PCI: Add NEC variants to Stratus 
ftServer PCIe DMI check)
Merging wireless-drivers/master (e3f31175a3ee ath9k: fix race condition in irq 
processing during hardware reset)
Merging driver-core.current/driver-core-linus (26bc420b59a3 Linux 3.19-rc6)
Merging tty.current/tty-linus (ec6f34e5b552 Linux 3.19-rc5)
Merging usb.current/usb-linus (e36f014edff7 Linux 3.19-rc7)
Merging usb-gadget-fixes/fixes (0df8fc37f6e4 usb: phy: never defer probe in 
non-OF case)
Merging usb-serial-fixes/usb-linus (a6f0331236fa USB: cp210x: add ID for 
RUGGEDCOM USB Serial Console)
Merging staging.current/staging-linus (e36f014edff7 Linux 3.19-rc7)
Merging char-misc.current/char-misc-linus (e36f014edff7 Linux 3.19-rc7)
Merging input-current/for-linus (47c1ffb2b6b6 Input: elantech - add more 
Fujtisu notebooks to force crc_enabled)
Merging crypto-current/master (3e14dcf7cb80 crypto: add missing crypto module 
aliases)
Merging ide/master (f96fe225677b Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging devicetree-current/devicetree/merge (6b1271de3723 of/unittest: Overlays 
with sub-devices tests)
Merging rr-fixes/fixes (d5db139ab376 module: make module_refcount() a signed 
integer.)
Merging vfio-fixes/for-linus (7c2e211f3c95 vfio-pci: Fix the check on pci 
device type in vfio_pci_probe())
Merging kselftest-fixes/fixes (f5db310d77ef selftests/vm: fix link error for 
transhuge-stress test)
Merging

linux-next: Tree for Feb 4

2015-02-04 Thread Stephen Rothwell

Hi all,

The next release I will be making will be next-20150209 - which will
probably be after the v3.19 release.

Changes since 20150203:

The sound-asoc tree gained a conflict against the sound tree.

The scsi tree gained a build failure caused by an interaction with the
driver-core tree.  I applied a merge fix patch.

The akpm-current tree gained a build failure for which I disabled
CONFIG_KASAN.

Non-merge commits (relative to Linus' tree): 7461
 7314 files changed, 309736 insertions(+), 172363 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use git pull
to do so as that will try to merge the new linux-next release with the
old one.  You should use git fetch and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log files
in the Next directory.  Between each merge, the tree was built with
a ppc64_defconfig for powerpc and an allmodconfig for x86_64 and a
multi_v7_defconfig for arm. After the final fixups (if any), it is also
built with powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig and
allyesconfig (this fails its final link) and i386, sparc, sparc64 and arm
defconfig.

Below is a summary of the state of the merge.

I am currently merging 206 trees (counting Linus' and 30 trees of patches
pending for Linus' tree).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

$ git checkout master
$ git reset --hard stable
Merging origin/master (0f98c38d725f Merge branch 'for-linus' of 
git://git.kernel.dk/linux-block)
Merging fixes/master (b94d525e58dc Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging kbuild-current/rc-fixes (a16c5f99a28c kbuild: Fix removal of the 
debian/ directory)
Merging arc-current/for-curr (2ce7598c9a45 Linux 3.17-rc4)
Merging arm-current/fixes (8e6480667246 ARM: 8299/1: mm: ensure local active 
ASID is marked as allocated on rollover)
CONFLICT (content): Merge conflict in arch/arm/mm/dma-mapping.c
Merging m68k-current/for-linus (f27bd5bfeda5 m68k: Wire up execveat)
Merging metag-fixes/fixes (ffe6902b66aa asm-generic: remove _STK_LIM_MAX)
Merging mips-fixes/mips-fixes (1795cd9b3a91 Linux 3.16-rc5)
Merging powerpc-merge/merge (31345e1a071e powerpc/pci: Remove unused 
force_32bit_msi quirk)
Merging powerpc-merge-mpe/fixes (c59c961ca511 Merge branch 'drm-fixes' of 
git://people.freedesktop.org/~airlied/linux)
Merging sparc/master (66d0f7ec9f10 sparc32: destroy_context() and switch_mm() 
needs to disable interrupts.)
Merging net/master (42b5212fee4f xen-netback: stop the guest rx thread after a 
fatal error)
Merging ipsec/master (59343cd7c480 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging sound-current/for-linus (4161b4505f16 ALSA: ak411x: Fix stall in work 
callback)
Merging pci-current/for-linus (51ac3d2f0c50 PCI: Add NEC variants to Stratus 
ftServer PCIe DMI check)
Merging wireless-drivers/master (e3f31175a3ee ath9k: fix race condition in irq 
processing during hardware reset)
Merging driver-core.current/driver-core-linus (26bc420b59a3 Linux 3.19-rc6)
Merging tty.current/tty-linus (ec6f34e5b552 Linux 3.19-rc5)
Merging usb.current/usb-linus (e36f014edff7 Linux 3.19-rc7)
Merging usb-gadget-fixes/fixes (0df8fc37f6e4 usb: phy: never defer probe in 
non-OF case)
Merging usb-serial-fixes/usb-linus (a6f0331236fa USB: cp210x: add ID for 
RUGGEDCOM USB Serial Console)
Merging staging.current/staging-linus (e36f014edff7 Linux 3.19-rc7)
Merging char-misc.current/char-misc-linus (e36f014edff7 Linux 3.19-rc7)
Merging input-current/for-linus (47c1ffb2b6b6 Input: elantech - add more 
Fujtisu notebooks to force crc_enabled)
Merging crypto-current/master (3e14dcf7cb80 crypto: add missing crypto module 
aliases)
Merging ide/master (f96fe225677b Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging devicetree-current/devicetree/merge (6b1271de3723 of/unittest: Overlays 
with sub-devices tests)
Merging rr-fixes/fixes (d5db139ab376 module: make module_refcount() a signed 
integer.)
Merging vfio-fixes/for-linus (7c2e211f3c95 vfio-pci: Fix the check on pci 
device type in vfio_pci_probe())
Merging kselftest-fixes/fixes (f5db310d77ef selftests/vm: fix link error for 
transhuge-stress test)
Merging

Re: linux-next: Tree for Feb 4

On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki wrote:
 On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote:
  On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell s...@canb.auug.org.au 
  wrote:
   Hi all,
  
   The next release I will be making will be next-20150209 - which will
   probably be after the v3.19 release.
  
   Changes since 20150203:
  
   The sound-asoc tree gained a conflict against the sound tree.
  
   The scsi tree gained a build failure caused by an interaction with the
   driver-core tree.  I applied a merge fix patch.
  
   The akpm-current tree gained a build failure for which I disabled
   CONFIG_KASAN.
  
   Non-merge commits (relative to Linus' tree): 7461
7314 files changed, 309736 insertions(+), 172363 deletions(-)
  
   
  
  
  [ CC linux-rcu | linux-pm | intel_pstate maintainers ]
 
 Dirk is not the maintainer of intel_pstate any more, CC: Kristen.
 
  Hi,
  
  after suspend-and-resume I see the following call-trace:
 
 Do you see that after CPU1 offline too? 
 
  ...
  [ 1144.482666] Disabling non-boot CPUs ...
  [ 1144.483000] intel_pstate CPU 1 exiting
  [ 1144.486064]
  [ 1144.486065] ===
  [ 1144.486067] smpboot: CPU 1 didn't die...
  [ 1144.486067] [ INFO: suspicious RCU usage. ]
  [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted
  [ 1144.486070] ---
  [ 1144.486072] include/trace/events/tlb.h:35 suspicious
  rcu_dereference_check() usage!
  [ 1144.486073]
  [ 1144.486073] other info that might help us debug this:
  [ 1144.486073]
  [ 1144.486074]
  [ 1144.486074] RCU used illegally from offline CPU!
  [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0
  [ 1144.486076] no locks held by swapper/1/0.
  [ 1144.486076]
  [ 1144.486076] stack backtrace:
  [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
  3.19.0-rc7-next-20150204.1-iniza-small #1
  [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
  530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
  [ 1144.486085]  0001 88011a44fe18 817e370d
  0011
  [ 1144.486088]  88011a448290 88011a44fe48 810d6847
  8800c66b9600
  [ 1144.486091]  0001 88011a44c000 81cb3900
  88011a44fe78
  [ 1144.486092] Call Trace:
  [ 1144.486099]  [817e370d] dump_stack+0x4c/0x65
  [ 1144.486104]  [810d6847] lockdep_rcu_suspicious+0xe7/0x120

As near as I can tell, idle_task_exit() is running on an offline CPU,
then calling switch_mm() which contains trace_tlb_flush(), which uses RCU.
And RCU is objecting to being used from a CPU that it is ignoring.

One approach would be to push RCU's idea of when the CPU goes offline
down into arch code in this case, using some Kconfig symbol and
the usual conditional compilation.  Another approach would be to
invoke the trace calls under cpu_online(), for example, for the
first such call in switch_mm():

if (cpu_online(smp_processor_id()))
trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);

The compiler would discard this if tracing was disabled.

Other thoughts?

Note that this use of RCU from an offline CPU is currently tolerated,
but is vulnerable to delays, for example, due to virtualization.
If a CPU takes more than one jiffy to get from _stop_machine() state
to fully offlined, life can be very hard.

  [ 1144.486109]  [810b71a5] idle_task_exit+0x205/0x2c0
  [ 1144.486113]  [81054c4e] play_dead_common+0xe/0x50
  [ 1144.486116]  [81054ca5] native_play_dead+0x15/0x140
  [ 1144.486121]  [8102963f] arch_cpu_idle_dead+0xf/0x20
  [ 1144.486123]  [810cd89e] cpu_startup_entry+0x37e/0x580
  [ 1144.486126]  [81053e20] start_secondary+0x140/0x150
  [ 1144.502920] intel_pstate CPU 2 exiting
  ...
  
  Not sure if this comes from the rcu or pm/intel_pstate area.
 
 New intel_pstate commits in linux-next are between 7ab0256e57ae and
 a04759924e25 inclusive.  Please check that range first.
 
 If that doesn't point you to the offender, you can pull the linux-next
 branch of the linux-pm.git tree at:
 
 git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
 
 and see if that alone triggers the issue for you.  If not, the offender is
 not there.  Otherwise, and if you use the ACPI cpuidle driver, you can
 check the acpi-processor merge point too.

This is almost certainly RCU getting more strict about CPUs using RCU
while offline.

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote:
 On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell s...@canb.auug.org.au 
 wrote:
  Hi all,
 
  The next release I will be making will be next-20150209 - which will
  probably be after the v3.19 release.
 
  Changes since 20150203:
 
  The sound-asoc tree gained a conflict against the sound tree.
 
  The scsi tree gained a build failure caused by an interaction with the
  driver-core tree.  I applied a merge fix patch.
 
  The akpm-current tree gained a build failure for which I disabled
  CONFIG_KASAN.
 
  Non-merge commits (relative to Linus' tree): 7461
   7314 files changed, 309736 insertions(+), 172363 deletions(-)
 
  
 
 
 [ CC linux-rcu | linux-pm | intel_pstate maintainers ]

Dirk is not the maintainer of intel_pstate any more, CC: Kristen.

 Hi,
 
 after suspend-and-resume I see the following call-trace:

Do you see that after CPU1 offline too? 

 ...
 [ 1144.482666] Disabling non-boot CPUs ...
 [ 1144.483000] intel_pstate CPU 1 exiting
 [ 1144.486064]
 [ 1144.486065] ===
 [ 1144.486067] smpboot: CPU 1 didn't die...
 [ 1144.486067] [ INFO: suspicious RCU usage. ]
 [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted
 [ 1144.486070] ---
 [ 1144.486072] include/trace/events/tlb.h:35 suspicious
 rcu_dereference_check() usage!
 [ 1144.486073]
 [ 1144.486073] other info that might help us debug this:
 [ 1144.486073]
 [ 1144.486074]
 [ 1144.486074] RCU used illegally from offline CPU!
 [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0
 [ 1144.486076] no locks held by swapper/1/0.
 [ 1144.486076]
 [ 1144.486076] stack backtrace:
 [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
 3.19.0-rc7-next-20150204.1-iniza-small #1
 [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
 [ 1144.486085]  0001 88011a44fe18 817e370d
 0011
 [ 1144.486088]  88011a448290 88011a44fe48 810d6847
 8800c66b9600
 [ 1144.486091]  0001 88011a44c000 81cb3900
 88011a44fe78
 [ 1144.486092] Call Trace:
 [ 1144.486099]  [817e370d] dump_stack+0x4c/0x65
 [ 1144.486104]  [810d6847] lockdep_rcu_suspicious+0xe7/0x120
 [ 1144.486109]  [810b71a5] idle_task_exit+0x205/0x2c0
 [ 1144.486113]  [81054c4e] play_dead_common+0xe/0x50
 [ 1144.486116]  [81054ca5] native_play_dead+0x15/0x140
 [ 1144.486121]  [8102963f] arch_cpu_idle_dead+0xf/0x20
 [ 1144.486123]  [810cd89e] cpu_startup_entry+0x37e/0x580
 [ 1144.486126]  [81053e20] start_secondary+0x140/0x150
 [ 1144.502920] intel_pstate CPU 2 exiting
 ...
 
 Not sure if this comes from the rcu or pm/intel_pstate area.

New intel_pstate commits in linux-next are between 7ab0256e57ae and
a04759924e25 inclusive.  Please check that range first.

If that doesn't point you to the offender, you can pull the linux-next
branch of the linux-pm.git tree at:

git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next

and see if that alone triggers the issue for you.  If not, the offender is
not there.  Otherwise, and if you use the ACPI cpuidle driver, you can
check the acpi-processor merge point too.


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

2015-02-04 Thread Martin K. Petersen

 Sedat == Sedat Dilek sedat.di...@gmail.com writes:

 I am seeing the following in my logs several times...
 
 Feb 4 02:53:13 fambox kernel: [15507.397482] blk_update_request:
 I/O error, dev loop0, sector 21261344 Feb 4 02:53:13 fambox
 kernel: [15507.397531] loop0: DISCARD failed.  Manually zeroing.

Sedat What's the plan... s/pr_warn/pr_debug ?

The rationale here is that we'd like to log (once) if discard or write
same fail on a given device.

In SCSI we disable these commands if they get failed by the storage. But
it looks like loop keeps advertising discard support after a failure.

Is your loop device encrypted? Do you know why the discard is failing?

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Wed, Feb 4, 2015 at 4:58 PM, Martin K. Petersen
martin.peter...@oracle.com wrote:
 Sedat == Sedat Dilek sedat.di...@gmail.com writes:

 I am seeing the following in my logs several times...

 Feb 4 02:53:13 fambox kernel: [15507.397482] blk_update_request:
 I/O error, dev loop0, sector 21261344 Feb 4 02:53:13 fambox
 kernel: [15507.397531] loop0: DISCARD failed.  Manually zeroing.

 Sedat What's the plan... s/pr_warn/pr_debug ?

 The rationale here is that we'd like to log (once) if discard or write
 same fail on a given device.

 In SCSI we disable these commands if they get failed by the storage. But
 it looks like loop keeps advertising discard support after a failure.

 Is your loop device encrypted? Do you know why the discard is failing?


No, but I am here on a so-called WUBI installation which triggered
some bugs being an exotic installation.
My Ubuntu/precise is a 18GiB image laying on my Win7 partition (/dev/sda2).

How can I check or debug the discard failing?

- Sedat -

P.S.: Some diagnostics

$ LC_ALL=C df -T
Filesystem Type 1K-blocks  Used Available Use% Mounted on
rootfs rootfs17753424  15663216   1165332  94% /
udev   devtmpfs   1959324 4   1959320   1% /dev
tmpfs  tmpfs   393888   904392984   1% /run
/dev/sda2  fuseblk  465546236 161295260 304250976  35% /host
/dev/loop0 ext4  17753424  15663216   1165332  94% /
none   tmpfs 5120 4  5116   1% /run/lock
none   tmpfs  1969428   176   1969252   1% /run/shm

$ cat /etc/fstab
# /etc/fstab: static file system information.

# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).

# file system mount point   type  options
 dump  pass
proc/proc   proc
nodev,noexec,nosuid 0   0
/host/ubuntu/disks/root.disk/   ext4
loop,errors=remount-ro  0   1
/host/ubuntu/disks/swap.disknoneswaploop,sw
 0   0

$ LC_ALL=C sudo fdisk -l /dev/sda
[sudo] password for wearefam:

Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0xcb9885ab

   Device Boot  Start End  Blocks   Id  System
/dev/sda1   *2048  206847  1024007  HPFS/NTFS/exFAT
/dev/sda2  206848   931299327   4655462407  HPFS/NTFS/exFAT
/dev/sda3   931299328   97677311922736896   27  Hidden NTFS WinRE

$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-3.13.0-45-generic root=UUID=001AADA61AAD9964
loop=/ubuntu/disks/root.disk ro
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Wednesday, February 04, 2015 11:46:32 PM Sedat Dilek wrote:
 On Wed, Feb 4, 2015 at 10:54 PM, Rafael J. Wysocki r...@rjwysocki.net wrote:
  On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote:
  On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell s...@canb.auug.org.au 
  wrote:
   Hi all,
  
   The next release I will be making will be next-20150209 - which will
   probably be after the v3.19 release.
  
   Changes since 20150203:
  
   The sound-asoc tree gained a conflict against the sound tree.
  
   The scsi tree gained a build failure caused by an interaction with the
   driver-core tree.  I applied a merge fix patch.
  
   The akpm-current tree gained a build failure for which I disabled
   CONFIG_KASAN.
  
   Non-merge commits (relative to Linus' tree): 7461
7314 files changed, 309736 insertions(+), 172363 deletions(-)
  
   
  
 
  [ CC linux-rcu | linux-pm | intel_pstate maintainers ]
 
  Dirk is not the maintainer of intel_pstate any more, CC: Kristen.
 
  Hi,
 
  after suspend-and-resume I see the following call-trace:
 
  Do you see that after CPU1 offline too?
 
 
 NO.
 
 After...
 
 root# echo 0  /sys/devices/system/cpu/cpu1/online
 
 ...I see this:
 
 +[  707.936668] PM: Saving platform NVS memory
 +[  707.936674] Disabling non-boot CPUs ...
 +[  707.936712] intel_pstate CPU 2 exiting
 +[  707.938024] smpboot: CPU 2 didn't die...
 +[  707.949128] intel_pstate CPU 3 exiting
 +[  707.950369] smpboot: CPU 3 didn't die...
 +[  707.966248] ACPI: Low-level resume complete
 +[  707.966302] PM: Restoring platform NVS memory
 
 Full dmesg attached.

The dmesg doesn't match what you said above.

Anyway, that's not what I meant.  Does the CPU1 offlining alone:

# echo 0  /sys/devices/system/cpu/cpu1/online

trigger the trace?  It should.


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Thu, Feb 5, 2015 at 12:30 AM, Rafael J. Wysocki r...@rjwysocki.net wrote:
 On Wednesday, February 04, 2015 11:46:32 PM Sedat Dilek wrote:
 On Wed, Feb 4, 2015 at 10:54 PM, Rafael J. Wysocki r...@rjwysocki.net 
 wrote:
  On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote:
  On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell s...@canb.auug.org.au 
  wrote:
   Hi all,
  
   The next release I will be making will be next-20150209 - which will
   probably be after the v3.19 release.
  
   Changes since 20150203:
  
   The sound-asoc tree gained a conflict against the sound tree.
  
   The scsi tree gained a build failure caused by an interaction with the
   driver-core tree.  I applied a merge fix patch.
  
   The akpm-current tree gained a build failure for which I disabled
   CONFIG_KASAN.
  
   Non-merge commits (relative to Linus' tree): 7461
7314 files changed, 309736 insertions(+), 172363 deletions(-)
  
   
  
 
  [ CC linux-rcu | linux-pm | intel_pstate maintainers ]
 
  Dirk is not the maintainer of intel_pstate any more, CC: Kristen.
 
  Hi,
 
  after suspend-and-resume I see the following call-trace:
 
  Do you see that after CPU1 offline too?
 

 NO.

 After...

 root# echo 0  /sys/devices/system/cpu/cpu1/online

 ...I see this:

 +[  707.936668] PM: Saving platform NVS memory
 +[  707.936674] Disabling non-boot CPUs ...
 +[  707.936712] intel_pstate CPU 2 exiting
 +[  707.938024] smpboot: CPU 2 didn't die...
 +[  707.949128] intel_pstate CPU 3 exiting
 +[  707.950369] smpboot: CPU 3 didn't die...
 +[  707.966248] ACPI: Low-level resume complete
 +[  707.966302] PM: Restoring platform NVS memory

 Full dmesg attached.

 The dmesg doesn't match what you said above.

 Anyway, that's not what I meant.  Does the CPU1 offlining alone:

 # echo 0  /sys/devices/system/cpu/cpu1/online

 trigger the trace?  It should.


YES, I see this...
...
[   84.668616] PPP BSD Compression module registered
[   84.678072] PPP Deflate Compression module registered
[  101.143582] intel_pstate CPU 1 exiting
[  101.157134]
[  101.157135] ===
[  101.157136] [ INFO: suspicious RCU usage. ]
[  101.157139] 3.19.0-rc7-next-20150204.3-iniza-small #1 Not tainted
[  101.157140] ---
[  101.157142] include/trace/events/tlb.h:35 suspicious
rcu_dereference_check() usage!
[  101.157142]
[  101.157142] other info that might help us debug this:
[  101.157142]
[  101.157143]
[  101.157143] RCU used illegally from offline CPU!
[  101.157143] rcu_scheduler_active = 1, debug_locks = 0
[  101.157144] no locks held by swapper/1/0.
[  101.157144]
[  101.157144] stack backtrace:
[  101.157146] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
3.19.0-rc7-next-20150204.3-iniza-small #1
[  101.157147] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
[  101.157151]  0001 88011a44fe18 817e35fd
0011
[  101.157153]  88011a448290 88011a44fe48 810d6847
8800d3b96100
[  101.157155]  0001 88011a44c000 0005
88011a44fe78
[  101.157156] Call Trace:
[  101.157162]  [817e35fd] dump_stack+0x4c/0x65
[  101.157166]  [810d6847] lockdep_rcu_suspicious+0xe7/0x120
[  101.157170]  [810b71a5] idle_task_exit+0x205/0x2c0
[  101.157173]  [81054c4e] play_dead_common+0xe/0x50
[  101.157175]  [81054ca5] native_play_dead+0x15/0x140
[  101.157179]  [8102963f] arch_cpu_idle_dead+0xf/0x20
[  101.157181]  [810cd89e] cpu_startup_entry+0x37e/0x580
[  101.157183]  [81053e20] start_secondary+0x140/0x150
[  101.157228] smpboot: CPU 1 is now offline

- Sedat -
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Wed, Feb 04, 2015 at 03:51:15PM -0800, Paul E. McKenney wrote:
 On Wed, Feb 04, 2015 at 11:59:31PM +0100, Rafael J. Wysocki wrote:
  On Wednesday, February 04, 2015 01:53:58 PM Paul E. McKenney wrote:
   On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki wrote:
On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote:

[ . . . ]

 [ 1144.482666] Disabling non-boot CPUs ...
 [ 1144.483000] intel_pstate CPU 1 exiting
 [ 1144.486064]
 [ 1144.486065] ===
 [ 1144.486067] smpboot: CPU 1 didn't die...
 [ 1144.486067] [ INFO: suspicious RCU usage. ]
 [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted
 [ 1144.486070] ---
 [ 1144.486072] include/trace/events/tlb.h:35 suspicious
 rcu_dereference_check() usage!
 [ 1144.486073]
 [ 1144.486073] other info that might help us debug this:
 [ 1144.486073]
 [ 1144.486074]
 [ 1144.486074] RCU used illegally from offline CPU!
 [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0
 [ 1144.486076] no locks held by swapper/1/0.
 [ 1144.486076]
 [ 1144.486076] stack backtrace:
 [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
 3.19.0-rc7-next-20150204.1-iniza-small #1
 [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
 [ 1144.486085]  0001 88011a44fe18 817e370d
 0011
 [ 1144.486088]  88011a448290 88011a44fe48 810d6847
 8800c66b9600
 [ 1144.486091]  0001 88011a44c000 81cb3900
 88011a44fe78
 [ 1144.486092] Call Trace:
 [ 1144.486099]  [817e370d] dump_stack+0x4c/0x65
 [ 1144.486104]  [810d6847] lockdep_rcu_suspicious+0xe7/0x120
   
   As near as I can tell, idle_task_exit() is running on an offline CPU,
   then calling switch_mm() which contains trace_tlb_flush(), which uses RCU.
   And RCU is objecting to being used from a CPU that it is ignoring.
   
   One approach would be to push RCU's idea of when the CPU goes offline
   down into arch code in this case, using some Kconfig symbol and
   the usual conditional compilation.  Another approach would be to
   invoke the trace calls under cpu_online(), for example, for the
   first such call in switch_mm():
   
 if (cpu_online(smp_processor_id()))
 trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
   
   The compiler would discard this if tracing was disabled.
  
  That looks like less intrusive to me.
 
 One possible concern is increased context-switch path length, but that
 would only be the case where tracing is enabled by default.

Nevertheless, here is an untested patch.  Does it help?

Thanx, Paul



x86: Omit switch_mm() tracing for offline CPUs

The architecture-specific switch_mm() function can be called by offline
CPUs, but includes event tracing, which cannot be legally carried out
on offline CPUs.  This results in a lockdep-RCU splat.  This commit fixes
this splat by omitting the tracing when the CPU is offline.

Reported-by: Sedat Dilek sedat.di...@gmail.com
Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com

diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index 40269a2bf6f9..7e7f2445fbc9 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -47,7 +47,8 @@ static inline void switch_mm(struct mm_struct *prev, struct 
mm_struct *next,
 
/* Re-load page tables */
load_cr3(next-pgd);
-   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
+   if (cpu_online(smp_processor_id()))
+   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 
TLB_FLUSH_ALL);
 
/* Stop flush ipis for the previous mm */
cpumask_clear_cpu(cpu, mm_cpumask(prev));
@@ -84,7 +85,8 @@ static inline void switch_mm(struct mm_struct *prev, struct 
mm_struct *next,
 * to make sure to use no freed page tables.
 */
load_cr3(next-pgd);
-   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 
TLB_FLUSH_ALL);
+   if (cpu_online(smp_processor_id()))
+   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 
TLB_FLUSH_ALL);
load_LDT_nolock(next-context);
}
}

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Thu, Feb 5, 2015 at 1:10 AM, Paul E. McKenney
paul...@linux.vnet.ibm.com wrote:
 On Wed, Feb 04, 2015 at 03:51:15PM -0800, Paul E. McKenney wrote:
 On Wed, Feb 04, 2015 at 11:59:31PM +0100, Rafael J. Wysocki wrote:
  On Wednesday, February 04, 2015 01:53:58 PM Paul E. McKenney wrote:
   On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki wrote:
On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote:

 [ . . . ]

 [ 1144.482666] Disabling non-boot CPUs ...
 [ 1144.483000] intel_pstate CPU 1 exiting
 [ 1144.486064]
 [ 1144.486065] ===
 [ 1144.486067] smpboot: CPU 1 didn't die...
 [ 1144.486067] [ INFO: suspicious RCU usage. ]
 [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted
 [ 1144.486070] ---
 [ 1144.486072] include/trace/events/tlb.h:35 suspicious
 rcu_dereference_check() usage!
 [ 1144.486073]
 [ 1144.486073] other info that might help us debug this:
 [ 1144.486073]
 [ 1144.486074]
 [ 1144.486074] RCU used illegally from offline CPU!
 [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0
 [ 1144.486076] no locks held by swapper/1/0.
 [ 1144.486076]
 [ 1144.486076] stack backtrace:
 [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
 3.19.0-rc7-next-20150204.1-iniza-small #1
 [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
 [ 1144.486085]  0001 88011a44fe18 817e370d
 0011
 [ 1144.486088]  88011a448290 88011a44fe48 810d6847
 8800c66b9600
 [ 1144.486091]  0001 88011a44c000 81cb3900
 88011a44fe78
 [ 1144.486092] Call Trace:
 [ 1144.486099]  [817e370d] dump_stack+0x4c/0x65
 [ 1144.486104]  [810d6847] 
 lockdep_rcu_suspicious+0xe7/0x120
  
   As near as I can tell, idle_task_exit() is running on an offline CPU,
   then calling switch_mm() which contains trace_tlb_flush(), which uses 
   RCU.
   And RCU is objecting to being used from a CPU that it is ignoring.
  
   One approach would be to push RCU's idea of when the CPU goes offline
   down into arch code in this case, using some Kconfig symbol and
   the usual conditional compilation.  Another approach would be to
   invoke the trace calls under cpu_online(), for example, for the
   first such call in switch_mm():
  
 if (cpu_online(smp_processor_id()))
 trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
  
   The compiler would discard this if tracing was disabled.
 
  That looks like less intrusive to me.

 One possible concern is increased context-switch path length, but that
 would only be the case where tracing is enabled by default.

 Nevertheless, here is an untested patch.  Does it help?

No bedtime :-)

I tried with a revert of...

commit 5f1dedac9adb6259bb7b62a923bd7c247a2f2d5b
rcu: Handle outgoing CPUs on exit from idle loop

...and offlining cpu1 seems not to produce the trace...

[  115.280244] PPP BSD Compression module registered
[  115.288761] PPP Deflate Compression module registered
[  162.935524] intel_pstate CPU 1 exiting
[  162.949729] smpboot: CPU 1 is now offline

Will try the patch.

- Sedat -


 Thanx, Paul

 

 x86: Omit switch_mm() tracing for offline CPUs

 The architecture-specific switch_mm() function can be called by offline
 CPUs, but includes event tracing, which cannot be legally carried out
 on offline CPUs.  This results in a lockdep-RCU splat.  This commit fixes
 this splat by omitting the tracing when the CPU is offline.

 Reported-by: Sedat Dilek sedat.di...@gmail.com
 Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com

 diff --git a/arch/x86/include/asm/mmu_context.h 
 b/arch/x86/include/asm/mmu_context.h
 index 40269a2bf6f9..7e7f2445fbc9 100644
 --- a/arch/x86/include/asm/mmu_context.h
 +++ b/arch/x86/include/asm/mmu_context.h
 @@ -47,7 +47,8 @@ static inline void switch_mm(struct mm_struct *prev, struct 
 mm_struct *next,

 /* Re-load page tables */
 load_cr3(next-pgd);
 -   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
 +   if (cpu_online(smp_processor_id()))
 +   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 
 TLB_FLUSH_ALL);

 /* Stop flush ipis for the previous mm */
 cpumask_clear_cpu(cpu, mm_cpumask(prev));
 @@ -84,7 +85,8 @@ static inline void switch_mm(struct mm_struct *prev, struct 
 mm_struct *next,
  * to make sure to use no freed page tables.
  */
 load_cr3(next-pgd);
 -   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH,

Re: linux-next: Tree for Feb 4

On Wednesday, February 04, 2015 01:53:58 PM Paul E. McKenney wrote:
 On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki wrote:
  On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote:
   On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell s...@canb.auug.org.au 
   wrote:
Hi all,
   
The next release I will be making will be next-20150209 - which will
probably be after the v3.19 release.
   
Changes since 20150203:
   
The sound-asoc tree gained a conflict against the sound tree.
   
The scsi tree gained a build failure caused by an interaction with the
driver-core tree.  I applied a merge fix patch.
   
The akpm-current tree gained a build failure for which I disabled
CONFIG_KASAN.
   
Non-merge commits (relative to Linus' tree): 7461
 7314 files changed, 309736 insertions(+), 172363 deletions(-)
   

   
   
   [ CC linux-rcu | linux-pm | intel_pstate maintainers ]
  
  Dirk is not the maintainer of intel_pstate any more, CC: Kristen.
  
   Hi,
   
   after suspend-and-resume I see the following call-trace:
  
  Do you see that after CPU1 offline too? 
  
   ...
   [ 1144.482666] Disabling non-boot CPUs ...
   [ 1144.483000] intel_pstate CPU 1 exiting
   [ 1144.486064]
   [ 1144.486065] ===
   [ 1144.486067] smpboot: CPU 1 didn't die...
   [ 1144.486067] [ INFO: suspicious RCU usage. ]
   [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted
   [ 1144.486070] ---
   [ 1144.486072] include/trace/events/tlb.h:35 suspicious
   rcu_dereference_check() usage!
   [ 1144.486073]
   [ 1144.486073] other info that might help us debug this:
   [ 1144.486073]
   [ 1144.486074]
   [ 1144.486074] RCU used illegally from offline CPU!
   [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0
   [ 1144.486076] no locks held by swapper/1/0.
   [ 1144.486076]
   [ 1144.486076] stack backtrace:
   [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
   3.19.0-rc7-next-20150204.1-iniza-small #1
   [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
   530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
   [ 1144.486085]  0001 88011a44fe18 817e370d
   0011
   [ 1144.486088]  88011a448290 88011a44fe48 810d6847
   8800c66b9600
   [ 1144.486091]  0001 88011a44c000 81cb3900
   88011a44fe78
   [ 1144.486092] Call Trace:
   [ 1144.486099]  [817e370d] dump_stack+0x4c/0x65
   [ 1144.486104]  [810d6847] lockdep_rcu_suspicious+0xe7/0x120
 
 As near as I can tell, idle_task_exit() is running on an offline CPU,
 then calling switch_mm() which contains trace_tlb_flush(), which uses RCU.
 And RCU is objecting to being used from a CPU that it is ignoring.
 
 One approach would be to push RCU's idea of when the CPU goes offline
 down into arch code in this case, using some Kconfig symbol and
 the usual conditional compilation.  Another approach would be to
 invoke the trace calls under cpu_online(), for example, for the
 first such call in switch_mm():
 
   if (cpu_online(smp_processor_id()))
   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
 
 The compiler would discard this if tracing was disabled.

That looks like less intrusive to me.

 Other thoughts?

Well, the whole issue here seems to be that common code using RCU is also
useful in places where RCU doesn't want to be used.  Arguably, we can deal
with all of those cases in a whack-a-mole manner, but that doesn't seem to
scale too well.

Rafael

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4

On Thu, Feb 5, 2015 at 12:25 AM, Rafael J. Wysocki r...@rjwysocki.net wrote:
 On Wednesday, February 04, 2015 11:38:40 PM Sedat Dilek wrote:
 On Wed, Feb 4, 2015 at 10:54 PM, Rafael J. Wysocki r...@rjwysocki.net 
 wrote:
  On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote:
  On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell s...@canb.auug.org.au 
  wrote:
   Hi all,
  
   The next release I will be making will be next-20150209 - which will
   probably be after the v3.19 release.
  
   Changes since 20150203:
  
   The sound-asoc tree gained a conflict against the sound tree.
  
   The scsi tree gained a build failure caused by an interaction with the
   driver-core tree.  I applied a merge fix patch.
  
   The akpm-current tree gained a build failure for which I disabled
   CONFIG_KASAN.
  
   Non-merge commits (relative to Linus' tree): 7461
7314 files changed, 309736 insertions(+), 172363 deletions(-)
  
   
  
 
  [ CC linux-rcu | linux-pm | intel_pstate maintainers ]
 
  Dirk is not the maintainer of intel_pstate any more, CC: Kristen.
 

 Yupp, I forwarded my original posting before you answered me.

  Hi,
 
  after suspend-and-resume I see the following call-trace:
 
  Do you see that after CPU1 offline too?
 

 Did not check yet.

  ...
  [ 1144.482666] Disabling non-boot CPUs ...
  [ 1144.483000] intel_pstate CPU 1 exiting
  [ 1144.486064]
  [ 1144.486065] ===
  [ 1144.486067] smpboot: CPU 1 didn't die...
  [ 1144.486067] [ INFO: suspicious RCU usage. ]
  [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted
  [ 1144.486070] ---
  [ 1144.486072] include/trace/events/tlb.h:35 suspicious
  rcu_dereference_check() usage!
  [ 1144.486073]
  [ 1144.486073] other info that might help us debug this:
  [ 1144.486073]
  [ 1144.486074]
  [ 1144.486074] RCU used illegally from offline CPU!
  [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0
  [ 1144.486076] no locks held by swapper/1/0.
  [ 1144.486076]
  [ 1144.486076] stack backtrace:
  [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
  3.19.0-rc7-next-20150204.1-iniza-small #1
  [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
  530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
  [ 1144.486085]  0001 88011a44fe18 817e370d
  0011
  [ 1144.486088]  88011a448290 88011a44fe48 810d6847
  8800c66b9600
  [ 1144.486091]  0001 88011a44c000 81cb3900
  88011a44fe78
  [ 1144.486092] Call Trace:
  [ 1144.486099]  [817e370d] dump_stack+0x4c/0x65
  [ 1144.486104]  [810d6847] lockdep_rcu_suspicious+0xe7/0x120
  [ 1144.486109]  [810b71a5] idle_task_exit+0x205/0x2c0
  [ 1144.486113]  [81054c4e] play_dead_common+0xe/0x50
  [ 1144.486116]  [81054ca5] native_play_dead+0x15/0x140
  [ 1144.486121]  [8102963f] arch_cpu_idle_dead+0xf/0x20
  [ 1144.486123]  [810cd89e] cpu_startup_entry+0x37e/0x580
  [ 1144.486126]  [81053e20] start_secondary+0x140/0x150
  [ 1144.502920] intel_pstate CPU 2 exiting
  ...
 
  Not sure if this comes from the rcu or pm/intel_pstate area.
 
  New intel_pstate commits in linux-next are between 7ab0256e57ae and
  a04759924e25 inclusive.  Please check that range first.
 

 Not sure if I am willing to test with reverted patches.
 ( /me was updating Linux graphic driver stack today built with
 upcomming llvm-toolchain v3.6.0. )

  If that doesn't point you to the offender, you can pull the linux-next
  branch of the linux-pm.git tree at:
 
  git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git 
  linux-next
 
  and see if that alone triggers the issue for you.  If not, the offender is
  not there.  Otherwise, and if you use the ACPI cpuidle driver, you can
  check the acpi-processor merge point too.
 

 I pulled in pm-next-20150204 on top of next-20150204, but that did not help.

 What I was asking about was to test linux-pm.git/linux-next *instead* *of* 
 full
 linux-next and not on top of it.  That would tell you whether or not the new 
 trace
 was introduced by one of the PM commits or elsewhere.


No, I did not test this.

 But this most likely is what Paul said anyway.


Not sure what you mean by this statement.

I tried -3 kernel with...

f64b348810c2 Revert intel_pstate: Add support for SkyLake
a0d825a39848 Revert intel_pstate: expose turbo range to sysfs
847153608ecf Revert intel_pstate: Add num_pstates to sysfs
412a6770cde4 Revert intel_pstate: respect cpufreq policy request
e2a6685023ed Revert intel_pstate: honor user space min_perf_pct
override on resume

...shows the trace when offlining cpu1 (w/o doing a s/r).

- Sedat -
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at

Re: linux-next: Tree for Feb 4

On Thu, Feb 5, 2015 at 12:51 AM, Paul E. McKenney
paul...@linux.vnet.ibm.com wrote:
 On Wed, Feb 04, 2015 at 11:59:31PM +0100, Rafael J. Wysocki wrote:
 On Wednesday, February 04, 2015 01:53:58 PM Paul E. McKenney wrote:
  On Wed, Feb 04, 2015 at 10:54:07PM +0100, Rafael J. Wysocki wrote:
   On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote:
On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell 
s...@canb.auug.org.au wrote:
 Hi all,

 The next release I will be making will be next-20150209 - which will
 probably be after the v3.19 release.

 Changes since 20150203:

 The sound-asoc tree gained a conflict against the sound tree.

 The scsi tree gained a build failure caused by an interaction with 
 the
 driver-core tree.  I applied a merge fix patch.

 The akpm-current tree gained a build failure for which I disabled
 CONFIG_KASAN.

 Non-merge commits (relative to Linus' tree): 7461
  7314 files changed, 309736 insertions(+), 172363 deletions(-)

 

   
[ CC linux-rcu | linux-pm | intel_pstate maintainers ]
  
   Dirk is not the maintainer of intel_pstate any more, CC: Kristen.
  
Hi,
   
after suspend-and-resume I see the following call-trace:
  
   Do you see that after CPU1 offline too?
  
...
[ 1144.482666] Disabling non-boot CPUs ...
[ 1144.483000] intel_pstate CPU 1 exiting
[ 1144.486064]
[ 1144.486065] ===
[ 1144.486067] smpboot: CPU 1 didn't die...
[ 1144.486067] [ INFO: suspicious RCU usage. ]
[ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted
[ 1144.486070] ---
[ 1144.486072] include/trace/events/tlb.h:35 suspicious
rcu_dereference_check() usage!
[ 1144.486073]
[ 1144.486073] other info that might help us debug this:
[ 1144.486073]
[ 1144.486074]
[ 1144.486074] RCU used illegally from offline CPU!
[ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0
[ 1144.486076] no locks held by swapper/1/0.
[ 1144.486076]
[ 1144.486076] stack backtrace:
[ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
3.19.0-rc7-next-20150204.1-iniza-small #1
[ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
[ 1144.486085]  0001 88011a44fe18 817e370d
0011
[ 1144.486088]  88011a448290 88011a44fe48 810d6847
8800c66b9600
[ 1144.486091]  0001 88011a44c000 81cb3900
88011a44fe78
[ 1144.486092] Call Trace:
[ 1144.486099]  [817e370d] dump_stack+0x4c/0x65
[ 1144.486104]  [810d6847] lockdep_rcu_suspicious+0xe7/0x120
 
  As near as I can tell, idle_task_exit() is running on an offline CPU,
  then calling switch_mm() which contains trace_tlb_flush(), which uses RCU.
  And RCU is objecting to being used from a CPU that it is ignoring.
 
  One approach would be to push RCU's idea of when the CPU goes offline
  down into arch code in this case, using some Kconfig symbol and
  the usual conditional compilation.  Another approach would be to
  invoke the trace calls under cpu_online(), for example, for the
  first such call in switch_mm():
 
  if (cpu_online(smp_processor_id()))
  trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
 
  The compiler would discard this if tracing was disabled.

 That looks like less intrusive to me.

 One possible concern is increased context-switch path length, but that
 would only be the case where tracing is enabled by default.


Hmmm, which kernel-config trace options do you mean in particular?

  Other thoughts?

 Well, the whole issue here seems to be that common code using RCU is also
 useful in places where RCU doesn't want to be used.  Arguably, we can deal
 with all of those cases in a whack-a-mole manner, but that doesn't seem to
 scale too well.

 Well, I did put a change into -next that makes these particular moles
 stick their heads up farther, so this is not a random event.  And in
 this particular case, we do have the option of extending RCU's reach to
 cover this operation, at the expense of a bit more intrusion by RCU into
 arch-specific code.  If tracing is enabled by default by major distros,
 that might be the right thing to do, unappealing though it might be.


Can you point me to that change in rcu-next?

 But yes, it would have been far better for RCU to have been picky to
 begin with, so that these issues could have been addressed as the were
 added to the kernel.  I guess one possible source of comfort is that once
 this is in place, future issues will make themselves immediately apparent.


Not sure what I now can do to help to trigger this down.

Here is 01:00 a.m. - bedtime :-).

- Sedat

Re: linux-next: Tree for Feb 4

On Wed, Feb 4, 2015 at 10:54 PM, Rafael J. Wysocki r...@rjwysocki.net wrote:
 On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote:
 On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell s...@canb.auug.org.au 
 wrote:
  Hi all,
 
  The next release I will be making will be next-20150209 - which will
  probably be after the v3.19 release.
 
  Changes since 20150203:
 
  The sound-asoc tree gained a conflict against the sound tree.
 
  The scsi tree gained a build failure caused by an interaction with the
  driver-core tree.  I applied a merge fix patch.
 
  The akpm-current tree gained a build failure for which I disabled
  CONFIG_KASAN.
 
  Non-merge commits (relative to Linus' tree): 7461
   7314 files changed, 309736 insertions(+), 172363 deletions(-)
 
  
 

 [ CC linux-rcu | linux-pm | intel_pstate maintainers ]

 Dirk is not the maintainer of intel_pstate any more, CC: Kristen.


Yupp, I forwarded my original posting before you answered me.

 Hi,

 after suspend-and-resume I see the following call-trace:

 Do you see that after CPU1 offline too?


Did not check yet.

 ...
 [ 1144.482666] Disabling non-boot CPUs ...
 [ 1144.483000] intel_pstate CPU 1 exiting
 [ 1144.486064]
 [ 1144.486065] ===
 [ 1144.486067] smpboot: CPU 1 didn't die...
 [ 1144.486067] [ INFO: suspicious RCU usage. ]
 [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted
 [ 1144.486070] ---
 [ 1144.486072] include/trace/events/tlb.h:35 suspicious
 rcu_dereference_check() usage!
 [ 1144.486073]
 [ 1144.486073] other info that might help us debug this:
 [ 1144.486073]
 [ 1144.486074]
 [ 1144.486074] RCU used illegally from offline CPU!
 [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0
 [ 1144.486076] no locks held by swapper/1/0.
 [ 1144.486076]
 [ 1144.486076] stack backtrace:
 [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
 3.19.0-rc7-next-20150204.1-iniza-small #1
 [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
 [ 1144.486085]  0001 88011a44fe18 817e370d
 0011
 [ 1144.486088]  88011a448290 88011a44fe48 810d6847
 8800c66b9600
 [ 1144.486091]  0001 88011a44c000 81cb3900
 88011a44fe78
 [ 1144.486092] Call Trace:
 [ 1144.486099]  [817e370d] dump_stack+0x4c/0x65
 [ 1144.486104]  [810d6847] lockdep_rcu_suspicious+0xe7/0x120
 [ 1144.486109]  [810b71a5] idle_task_exit+0x205/0x2c0
 [ 1144.486113]  [81054c4e] play_dead_common+0xe/0x50
 [ 1144.486116]  [81054ca5] native_play_dead+0x15/0x140
 [ 1144.486121]  [8102963f] arch_cpu_idle_dead+0xf/0x20
 [ 1144.486123]  [810cd89e] cpu_startup_entry+0x37e/0x580
 [ 1144.486126]  [81053e20] start_secondary+0x140/0x150
 [ 1144.502920] intel_pstate CPU 2 exiting
 ...

 Not sure if this comes from the rcu or pm/intel_pstate area.

 New intel_pstate commits in linux-next are between 7ab0256e57ae and
 a04759924e25 inclusive.  Please check that range first.


Not sure if I am willing to test with reverted patches.
( /me was updating Linux graphic driver stack today built with
upcomming llvm-toolchain v3.6.0. )

 If that doesn't point you to the offender, you can pull the linux-next
 branch of the linux-pm.git tree at:

 git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next

 and see if that alone triggers the issue for you.  If not, the offender is
 not there.  Otherwise, and if you use the ACPI cpuidle driver, you can
 check the acpi-processor merge point too.


I pulled in pm-next-20150204 on top of next-20150204, but that did not help.

- Sedat -
Jiang Liu (8):
  ACPI: Fix a bug in parsing ACPI Memory24 resource
  ACPI: Normalize return value of resource parser functions
  ACPI: Set flag IORESOURCE_UNSET for unassigned resources
  ACPI: Enforce stricter checks for address space descriptors
  ACPI: Return translation offset when parsing ACPI address space resources
  ACPI: Translate resource into master side address for bridge window resources
  ACPI: Add field offset to struct resource_list_entry
  ACPI: Introduce helper function acpi_dev_filter_resource_type()

Markus Elfring (1):
  cpufreq-dt: Drop unnecessary check before cpufreq_cooling_unregister() invocation

Rafael J. Wysocki (14):
  ACPI / cpuidle: Drop unnecessary calls from acpi_idle_do_entry()
  ACPI / cpuidle: Drop unnecessary calls from -enter callback routines
  ACPI / cpuidle: Clean up fallback to C1 checks
  ACPI / cpuidle: Drop irrelevant comment from acpi_idle_enter_simple()
  ACPI / cpuidle: Clean up white space in a switch statement
  ACPI / cpuidle: Drop flags.bm_check tests from acpi_idle_enter_bm()
  ACPI / cpuidle: Merge acpi_idle_enter_c1() and

Re: linux-next: Tree for Feb 4

On Wed, Feb 4, 2015 at 10:54 PM, Rafael J. Wysocki r...@rjwysocki.net wrote:
 On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote:
 On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell s...@canb.auug.org.au 
 wrote:
  Hi all,
 
  The next release I will be making will be next-20150209 - which will
  probably be after the v3.19 release.
 
  Changes since 20150203:
 
  The sound-asoc tree gained a conflict against the sound tree.
 
  The scsi tree gained a build failure caused by an interaction with the
  driver-core tree.  I applied a merge fix patch.
 
  The akpm-current tree gained a build failure for which I disabled
  CONFIG_KASAN.
 
  Non-merge commits (relative to Linus' tree): 7461
   7314 files changed, 309736 insertions(+), 172363 deletions(-)
 
  
 

 [ CC linux-rcu | linux-pm | intel_pstate maintainers ]

 Dirk is not the maintainer of intel_pstate any more, CC: Kristen.

 Hi,

 after suspend-and-resume I see the following call-trace:

 Do you see that after CPU1 offline too?


NO.

After...

root# echo 0  /sys/devices/system/cpu/cpu1/online

...I see this:

+[  707.936668] PM: Saving platform NVS memory
+[  707.936674] Disabling non-boot CPUs ...
+[  707.936712] intel_pstate CPU 2 exiting
+[  707.938024] smpboot: CPU 2 didn't die...
+[  707.949128] intel_pstate CPU 3 exiting
+[  707.950369] smpboot: CPU 3 didn't die...
+[  707.966248] ACPI: Low-level resume complete
+[  707.966302] PM: Restoring platform NVS memory

Full dmesg attached.

I hope this helps.

- Sedat -

 ...
 [ 1144.482666] Disabling non-boot CPUs ...
 [ 1144.483000] intel_pstate CPU 1 exiting
 [ 1144.486064]
 [ 1144.486065] ===
 [ 1144.486067] smpboot: CPU 1 didn't die...
 [ 1144.486067] [ INFO: suspicious RCU usage. ]
 [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted
 [ 1144.486070] ---
 [ 1144.486072] include/trace/events/tlb.h:35 suspicious
 rcu_dereference_check() usage!
 [ 1144.486073]
 [ 1144.486073] other info that might help us debug this:
 [ 1144.486073]
 [ 1144.486074]
 [ 1144.486074] RCU used illegally from offline CPU!
 [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0
 [ 1144.486076] no locks held by swapper/1/0.
 [ 1144.486076]
 [ 1144.486076] stack backtrace:
 [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
 3.19.0-rc7-next-20150204.1-iniza-small #1
 [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
 [ 1144.486085]  0001 88011a44fe18 817e370d
 0011
 [ 1144.486088]  88011a448290 88011a44fe48 810d6847
 8800c66b9600
 [ 1144.486091]  0001 88011a44c000 81cb3900
 88011a44fe78
 [ 1144.486092] Call Trace:
 [ 1144.486099]  [817e370d] dump_stack+0x4c/0x65
 [ 1144.486104]  [810d6847] lockdep_rcu_suspicious+0xe7/0x120
 [ 1144.486109]  [810b71a5] idle_task_exit+0x205/0x2c0
 [ 1144.486113]  [81054c4e] play_dead_common+0xe/0x50
 [ 1144.486116]  [81054ca5] native_play_dead+0x15/0x140
 [ 1144.486121]  [8102963f] arch_cpu_idle_dead+0xf/0x20
 [ 1144.486123]  [810cd89e] cpu_startup_entry+0x37e/0x580
 [ 1144.486126]  [81053e20] start_secondary+0x140/0x150
 [ 1144.502920] intel_pstate CPU 2 exiting
 ...

 Not sure if this comes from the rcu or pm/intel_pstate area.

 New intel_pstate commits in linux-next are between 7ab0256e57ae and
 a04759924e25 inclusive.  Please check that range first.

 If that doesn't point you to the offender, you can pull the linux-next
 branch of the linux-pm.git tree at:

 git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next

 and see if that alone triggers the issue for you.  If not, the offender is
 not there.  Otherwise, and if you use the ACPI cpuidle driver, you can
 check the acpi-processor merge point too.


 --
 I speak only for myself.
 Rafael J. Wysocki, Intel Open Source Technology Center.
[0.00] Initializing cgroup subsys cpuset
[0.00] Initializing cgroup subsys cpu
[0.00] Initializing cgroup subsys cpuacct
[0.00] Linux version 3.19.0-rc7-next-20150204.2-iniza-small 
(sedat.di...@gmail.com@fambox) (gcc version 4.9.2 (Ubuntu 4.9.2-0ubuntu1~12.04) 
) #1 SMP Wed Feb 4 23:25:30 CET 2015
[0.00] Command line: 
BOOT_IMAGE=/boot/vmlinuz-3.19.0-rc7-next-20150204.2-iniza-small 
root=UUID=001AADA61AAD9964 loop=/ubuntu/disks/root.disk ro
[0.00] KERNEL supported cpus:
[0.00]   Intel GenuineIntel
[0.00]   AMD AuthenticAMD
[0.00]   Centaur CentaurHauls
[0.00] Disabled fast string operations
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009d7ff] usable
[0.00] BIOS-e820: [mem 0x0009d800-0x0009] reserved
[0.00]

Re: linux-next: Tree for Feb 4

On Wednesday, February 04, 2015 11:38:40 PM Sedat Dilek wrote:
 On Wed, Feb 4, 2015 at 10:54 PM, Rafael J. Wysocki r...@rjwysocki.net wrote:
  On Wednesday, February 04, 2015 09:18:03 PM Sedat Dilek wrote:
  On Wed, Feb 4, 2015 at 9:35 AM, Stephen Rothwell s...@canb.auug.org.au 
  wrote:
   Hi all,
  
   The next release I will be making will be next-20150209 - which will
   probably be after the v3.19 release.
  
   Changes since 20150203:
  
   The sound-asoc tree gained a conflict against the sound tree.
  
   The scsi tree gained a build failure caused by an interaction with the
   driver-core tree.  I applied a merge fix patch.
  
   The akpm-current tree gained a build failure for which I disabled
   CONFIG_KASAN.
  
   Non-merge commits (relative to Linus' tree): 7461
7314 files changed, 309736 insertions(+), 172363 deletions(-)
  
   
  
 
  [ CC linux-rcu | linux-pm | intel_pstate maintainers ]
 
  Dirk is not the maintainer of intel_pstate any more, CC: Kristen.
 
 
 Yupp, I forwarded my original posting before you answered me.
 
  Hi,
 
  after suspend-and-resume I see the following call-trace:
 
  Do you see that after CPU1 offline too?
 
 
 Did not check yet.
 
  ...
  [ 1144.482666] Disabling non-boot CPUs ...
  [ 1144.483000] intel_pstate CPU 1 exiting
  [ 1144.486064]
  [ 1144.486065] ===
  [ 1144.486067] smpboot: CPU 1 didn't die...
  [ 1144.486067] [ INFO: suspicious RCU usage. ]
  [ 1144.486069] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted
  [ 1144.486070] ---
  [ 1144.486072] include/trace/events/tlb.h:35 suspicious
  rcu_dereference_check() usage!
  [ 1144.486073]
  [ 1144.486073] other info that might help us debug this:
  [ 1144.486073]
  [ 1144.486074]
  [ 1144.486074] RCU used illegally from offline CPU!
  [ 1144.486074] rcu_scheduler_active = 1, debug_locks = 0
  [ 1144.486076] no locks held by swapper/1/0.
  [ 1144.486076]
  [ 1144.486076] stack backtrace:
  [ 1144.486079] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
  3.19.0-rc7-next-20150204.1-iniza-small #1
  [ 1144.486080] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
  530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
  [ 1144.486085]  0001 88011a44fe18 817e370d
  0011
  [ 1144.486088]  88011a448290 88011a44fe48 810d6847
  8800c66b9600
  [ 1144.486091]  0001 88011a44c000 81cb3900
  88011a44fe78
  [ 1144.486092] Call Trace:
  [ 1144.486099]  [817e370d] dump_stack+0x4c/0x65
  [ 1144.486104]  [810d6847] lockdep_rcu_suspicious+0xe7/0x120
  [ 1144.486109]  [810b71a5] idle_task_exit+0x205/0x2c0
  [ 1144.486113]  [81054c4e] play_dead_common+0xe/0x50
  [ 1144.486116]  [81054ca5] native_play_dead+0x15/0x140
  [ 1144.486121]  [8102963f] arch_cpu_idle_dead+0xf/0x20
  [ 1144.486123]  [810cd89e] cpu_startup_entry+0x37e/0x580
  [ 1144.486126]  [81053e20] start_secondary+0x140/0x150
  [ 1144.502920] intel_pstate CPU 2 exiting
  ...
 
  Not sure if this comes from the rcu or pm/intel_pstate area.
 
  New intel_pstate commits in linux-next are between 7ab0256e57ae and
  a04759924e25 inclusive.  Please check that range first.
 
 
 Not sure if I am willing to test with reverted patches.
 ( /me was updating Linux graphic driver stack today built with
 upcomming llvm-toolchain v3.6.0. )
 
  If that doesn't point you to the offender, you can pull the linux-next
  branch of the linux-pm.git tree at:
 
  git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
 
  and see if that alone triggers the issue for you.  If not, the offender is
  not there.  Otherwise, and if you use the ACPI cpuidle driver, you can
  check the acpi-processor merge point too.
 
 
 I pulled in pm-next-20150204 on top of next-20150204, but that did not help.

What I was asking about was to test linux-pm.git/linux-next *instead* *of* full
linux-next and not on top of it.  That would tell you whether or not the new 
trace
was introduced by one of the PM commits or elsewhere.

But this most likely is what Paul said anyway.


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 4