Re: [PATCH 3/3] tracing/hwlat: Fix a few trivial nits

2019-10-14 Thread Srivatsa S. Bhat
On 10/14/19 12:24 PM, Steven Rostedt wrote:
> On Thu, 10 Oct 2019 11:51:17 -0700
> "Srivatsa S. Bhat"  wrote:
> 
>> From: Srivatsa S. Bhat (VMware) 
>>
>> Update the source file name in the comments, and fix a grammatical
>> error.
> 
> Patch 1 and 2 have already been applied to Linus's tree.
> 
> I've queued this one up for the next merge window.
> 
> Thanks!
> 

Thanks a lot Steve!

Regards,
Srivatsa


[PATCH 3/3] tracing/hwlat: Fix a few trivial nits

2019-10-10 Thread Srivatsa S. Bhat
From: Srivatsa S. Bhat (VMware) 

Update the source file name in the comments, and fix a grammatical
error.

Signed-off-by: Srivatsa S. Bhat (VMware) 
---

 kernel/trace/trace_hwlat.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c
index 862f4b0..941cb82 100644
--- a/kernel/trace/trace_hwlat.c
+++ b/kernel/trace/trace_hwlat.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
- * trace_hwlatdetect.c - A simple Hardware Latency detector.
+ * trace_hwlat.c - A simple Hardware Latency detector.
  *
  * Use this tracer to detect large system latencies induced by the behavior of
  * certain underlying system hardware or firmware, independent of Linux itself.
@@ -276,7 +276,7 @@ static void move_to_next_cpu(void)
return;
/*
 * If for some reason the user modifies the CPU affinity
-* of this thread, than stop migrating for the duration
+* of this thread, then stop migrating for the duration
 * of the current test.
 */
if (!cpumask_equal(current_mask, current->cpus_ptr))



[PATCH 2/3] tracing/hwlat: Don't ignore outer-loop duration when calculating max_latency

2019-10-10 Thread Srivatsa S. Bhat
From: Srivatsa S. Bhat (VMware) 

max_latency is intended to record the maximum ever observed hardware
latency, which may occur in either part of the loop (inner/outer). So
we need to also consider the outer-loop sample when updating
max_latency.

Fixes: e7c15cd8a113 ("tracing: Added hardware latency tracer")
Cc: sta...@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat (VMware) 
---

 kernel/trace/trace_hwlat.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c
index a0251a7..862f4b0 100644
--- a/kernel/trace/trace_hwlat.c
+++ b/kernel/trace/trace_hwlat.c
@@ -256,6 +256,8 @@ static int get_sample(void)
/* Keep a running maximum ever recorded hardware latency */
if (sample > tr->max_latency)
tr->max_latency = sample;
+   if (outer_sample > tr->max_latency)
+   tr->max_latency = outer_sample;
}
 
 out:



[PATCH 1/3] tracing/hwlat: Report total time spent in all NMIs during the sample

2019-10-10 Thread Srivatsa S. Bhat
From: Srivatsa S. Bhat (VMware) 

nmi_total_ts is supposed to record the total time spent in *all* NMIs
that occur on the given CPU during the (active portion of the)
sampling window. However, the code seems to be overwriting this
variable for each NMI, thereby only recording the time spent in the
most recent NMI. Fix it by accumulating the duration instead.

Fixes: 7b2c86250122 ("tracing: Add NMI tracing in hwlat detector")
Cc: sta...@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat (VMware) 
---

 kernel/trace/trace_hwlat.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c
index fa95139..a0251a7 100644
--- a/kernel/trace/trace_hwlat.c
+++ b/kernel/trace/trace_hwlat.c
@@ -150,7 +150,7 @@ void trace_hwlat_callback(bool enter)
if (enter)
nmi_ts_start = time_get();
else
-   nmi_total_ts = time_get() - nmi_ts_start;
+   nmi_total_ts += time_get() - nmi_ts_start;
}
 
if (enter)



Re: [PATCH BUGFIX IMPROVEMENT 0/7] boost throughput with synced I/O, reduce latency and fix a bandwidth bug

2019-06-24 Thread Srivatsa S. Bhat
On 6/24/19 12:40 PM, Paolo Valente wrote:
> Hi Jens,
> this series, based against for-5.3/block, contains:
> 1) The improvements to recover the throughput loss reported by
>Srivatsa [1] (first five patches)
> 2) A preemption improvement to reduce I/O latency
> 3) A fix of a subtle bug causing loss of control over I/O bandwidths
> 

Thanks a lot for these patches, Paolo!

Would you mind adding:

Reported-by: Srivatsa S. Bhat (VMware) 
Tested-by: Srivatsa S. Bhat (VMware) 

to the first 5 patches, as appropriate?

Thank you!

> 
> [1] https://lkml.org/lkml/2019/5/17/755
> 
> Paolo Valente (7):
>   block, bfq: reset inject limit when think-time state changes
>   block, bfq: fix rq_in_driver check in bfq_update_inject_limit
>   block, bfq: update base request service times when possible
>   block, bfq: bring forward seek time update
>   block, bfq: detect wakers and unconditionally inject their I/O
>   block, bfq: preempt lower-weight or lower-priority queues
>   block, bfq: re-schedule empty queues if they deserve I/O plugging
> 
>  block/bfq-iosched.c | 952 ++--
>  block/bfq-iosched.h |  25 +-
>  2 files changed, 686 insertions(+), 291 deletions(-)
> 

Regards,
Srivatsa
VMware Photon OS


Re: CFQ idling kills I/O performance on ext4 with blkio cgroup controller

2019-05-22 Thread Srivatsa S. Bhat
On 5/22/19 2:12 AM, Paolo Valente wrote:
> 
>> Il giorno 22 mag 2019, alle ore 11:02, Srivatsa S. Bhat 
>>  ha scritto:
>>
>>
>> Let's continue here on LKML itself.
> 
> Just done :)
> 
>> The only reason I created the
>> bugzilla entry is to attach the tarball of the traces, assuming
>> that it would allow me to upload a 20 MB file (since email attachment
>> didn't work). But bugzilla's file restriction is much smaller than
>> that, so it didn't work out either, and I resorted to using dropbox.
>> So we don't need the bugzilla entry anymore; I might as well close it
>> to avoid confusion.
>>
> 
> No no, don't close it: it can reach people that don't use LKML.  We
> just have to remember to report back at the end of this.

Ah, good point!

>  BTW, I also
> think that the bug is incorrectly filed against 5.1, while all these
> tests and results concern 5.2-rcX.
> 

Fixed now, thank you for pointing out!
 
Regards,
Srivatsa
VMware Photon OS


Re: CFQ idling kills I/O performance on ext4 with blkio cgroup controller

2019-05-22 Thread Srivatsa S. Bhat
On 5/22/19 2:09 AM, Paolo Valente wrote:
> 
> First, thank you very much for testing my patches, and, above all, for
> sharing those huge traces!
> 
> According to the your traces, the residual 20% lower throughput that you
> record is due to the fact that the BFQ injection mechanism takes a few
> hundredths of seconds to stabilize, at the beginning of the workload.
> During that setup time, the throughput is equal to the dreadful ~60-90 KB/s
> that you see without this new patch.  After that time, there
> seems to be no loss according to the trace.
> 
> The problem is that a loss lasting only a few hundredths of seconds is
> however not negligible for a write workload that lasts only 3-4
> seconds.  Could you please try writing a larger file?
> 

I tried running dd for longer (about 100 seconds), but still saw around
1.4 MB/s throughput with BFQ, and between 1.5 MB/s - 1.6 MB/s with
mq-deadline and noop. But I'm not too worried about that difference.

> In addition, I wanted to ask you whether you measured BFQ throughput
> with traces disabled.  This may make a difference.
> 

The above result (1.4 MB/s) was obtained with traces disabled.

> After trying writing a larger file, you can try with low_latency on.
> On my side, it causes results to become a little unstable across
> repetitions (which is expected).
> 
With low_latency on, I get between 60 KB/s - 100 KB/s.

Regards,
Srivatsa
VMware Photon OS


[PATCH] tracing: Fix documentation about disabling options using trace_options

2019-01-28 Thread Srivatsa S. Bhat
From: Srivatsa S. Bhat (VMware) 

To disable a tracing option using the trace_options file, the option
name needs to be prefixed with 'no', and not suffixed, as the README
states. Fix it.

Signed-off-by: Srivatsa S. Bhat (VMware) 
---

 kernel/trace/trace.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index c521b73..d632458 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4541,7 +4541,7 @@ static const char readme_msg[] =
"  instances\t\t- Make sub-buffers with: mkdir instances/foo\n"
"\t\t\t  Remove sub-buffer with rmdir\n"
"  trace_options\t\t- Set format or modify how tracing happens\n"
-   "\t\t\t  Disable an option by adding a suffix 'no' to the\n"
+   "\t\t\t  Disable an option by prefixing 'no' to the\n"
"\t\t\t  option name\n"
"  saved_cmdlines_size\t- echo command number in here to store comm-pid 
list\n"
 #ifdef CONFIG_DYNAMIC_FTRACE



[PATCH 4.4.y 046/101] x86/speculation: Move firmware_restrict_branch_speculation_*() from C to CPP

2018-07-14 Thread Srivatsa S. Bhat
From: Ingo Molnar 

commit d72f4e29e6d84b7ec02ae93088aa459ac70e733b upstream.

firmware_restrict_branch_speculation_*() recently started using
preempt_enable()/disable(), but those are relatively high level
primitives and cause build failures on some 32-bit builds.

Since we want to keep  low level, convert
them to macros to avoid header hell...

Cc: David Woodhouse 
Cc: Thomas Gleixner 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: arjan.van.de@intel.com
Cc: b...@alien8.de
Cc: dave.han...@intel.com
Cc: jmatt...@google.com
Cc: karah...@amazon.de
Cc: k...@vger.kernel.org
Cc: pbonz...@redhat.com
Cc: rkrc...@redhat.com
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Srivatsa S. Bhat 
Reviewed-by: Matt Helsley (VMware) 
Reviewed-by: Alexey Makhalov 
Reviewed-by: Bo Gan 
---

 arch/x86/include/asm/nospec-branch.h |   26 ++
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/nospec-branch.h 
b/arch/x86/include/asm/nospec-branch.h
index 36ded24..b9dd1d9 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -214,20 +214,22 @@ static inline void 
indirect_branch_prediction_barrier(void)
 /*
  * With retpoline, we must use IBRS to restrict branch prediction
  * before calling into firmware.
+ *
+ * (Implemented as CPP macros due to header hell.)
  */
-static inline void firmware_restrict_branch_speculation_start(void)
-{
-   preempt_disable();
-   alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,
- X86_FEATURE_USE_IBRS_FW);
-}
+#define firmware_restrict_branch_speculation_start()   \
+do {   \
+   preempt_disable();  \
+   alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,   \
+ X86_FEATURE_USE_IBRS_FW); \
+} while (0)
 
-static inline void firmware_restrict_branch_speculation_end(void)
-{
-   alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
- X86_FEATURE_USE_IBRS_FW);
-   preempt_enable();
-}
+#define firmware_restrict_branch_speculation_end() \
+do {   \
+   alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,\
+ X86_FEATURE_USE_IBRS_FW); \
+   preempt_enable();   \
+} while (0)
 
 #endif /* __ASSEMBLY__ */
 



[PATCH 4.4.y 046/101] x86/speculation: Move firmware_restrict_branch_speculation_*() from C to CPP

2018-07-14 Thread Srivatsa S. Bhat
From: Ingo Molnar 

commit d72f4e29e6d84b7ec02ae93088aa459ac70e733b upstream.

firmware_restrict_branch_speculation_*() recently started using
preempt_enable()/disable(), but those are relatively high level
primitives and cause build failures on some 32-bit builds.

Since we want to keep  low level, convert
them to macros to avoid header hell...

Cc: David Woodhouse 
Cc: Thomas Gleixner 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: arjan.van.de@intel.com
Cc: b...@alien8.de
Cc: dave.han...@intel.com
Cc: jmatt...@google.com
Cc: karah...@amazon.de
Cc: k...@vger.kernel.org
Cc: pbonz...@redhat.com
Cc: rkrc...@redhat.com
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Srivatsa S. Bhat 
Reviewed-by: Matt Helsley (VMware) 
Reviewed-by: Alexey Makhalov 
Reviewed-by: Bo Gan 
---

 arch/x86/include/asm/nospec-branch.h |   26 ++
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/nospec-branch.h 
b/arch/x86/include/asm/nospec-branch.h
index 36ded24..b9dd1d9 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -214,20 +214,22 @@ static inline void 
indirect_branch_prediction_barrier(void)
 /*
  * With retpoline, we must use IBRS to restrict branch prediction
  * before calling into firmware.
+ *
+ * (Implemented as CPP macros due to header hell.)
  */
-static inline void firmware_restrict_branch_speculation_start(void)
-{
-   preempt_disable();
-   alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,
- X86_FEATURE_USE_IBRS_FW);
-}
+#define firmware_restrict_branch_speculation_start()   \
+do {   \
+   preempt_disable();  \
+   alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,   \
+ X86_FEATURE_USE_IBRS_FW); \
+} while (0)
 
-static inline void firmware_restrict_branch_speculation_end(void)
-{
-   alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
- X86_FEATURE_USE_IBRS_FW);
-   preempt_enable();
-}
+#define firmware_restrict_branch_speculation_end() \
+do {   \
+   alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,\
+ X86_FEATURE_USE_IBRS_FW); \
+   preempt_enable();   \
+} while (0)
 
 #endif /* __ASSEMBLY__ */
 



[PATCH 4.4.y 033/101] x86/asm/entry/32: Simplify pushes of zeroed pt_regs->REGs

2018-07-14 Thread Srivatsa S. Bhat
From: Denys Vlasenko 

commit 778843f934e362ed4ed734520f60a44a78a074b4 upstream

Use of a temporary R8 register here seems to be unnecessary.

"push %r8" is a two-byte insn (it needs REX prefix to specify R8),
"push $0" is two-byte too. It seems just using the latter would be
no worse.

Thus, code had an unnecessary "xorq %r8,%r8" insn.
It probably costs nothing in execution time here since we are probably
limited by store bandwidth at this point, but still.

Run-tested under QEMU: 32-bit calls still work:

 / # ./test_syscall_vdso32
 [RUN]  Executing 6-argument 32-bit syscall via VDSO
 [OK]   Arguments are preserved across syscall
 [NOTE] R11 has changed:00200ed7 - assuming clobbered by SYSRET insn
 [OK]   R8..R15 did not leak kernel data
 [RUN]  Executing 6-argument 32-bit syscall via INT 80
 [OK]   Arguments are preserved across syscall
 [OK]   R8..R15 did not leak kernel data
 [RUN]  Running tests under ptrace
 [RUN]  Executing 6-argument 32-bit syscall via VDSO
 [OK]   Arguments are preserved across syscall
 [NOTE] R11 has changed:00200ed7 - assuming clobbered by SYSRET insn
 [OK]   R8..R15 did not leak kernel data
 [RUN]  Executing 6-argument 32-bit syscall via INT 80
 [OK]   Arguments are preserved across syscall
 [OK]   R8..R15 did not leak kernel data

Signed-off-by: Denys Vlasenko 
Acked-by: Andy Lutomirski 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Frederic Weisbecker 
Cc: H. Peter Anvin 
Cc: Kees Cook 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Steven Rostedt 
Cc: Thomas Gleixner 
Cc: Will Drewry 
Cc: linux-kernel@vger.kernel.org
Link: 
http://lkml.kernel.org/r/1462201010-16846-1-git-send-email-dvlas...@redhat.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Srivatsa S. Bhat 
Reviewed-by: Matt Helsley (VMware) 
Reviewed-by: Alexey Makhalov 
Reviewed-by: Bo Gan 
---

 arch/x86/entry/entry_64_compat.S |   45 ++
 1 file changed, 21 insertions(+), 24 deletions(-)

diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index d03bf0e..e479ff8 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -79,24 +79,23 @@ ENTRY(entry_SYSENTER_compat)
ASM_CLAC/* Clear AC after saving FLAGS */
 
pushq   $__USER32_CS/* pt_regs->cs */
-   xorq%r8,%r8
-   pushq   %r8 /* pt_regs->ip = 0 (placeholder) */
+   pushq   $0  /* pt_regs->ip = 0 (placeholder) */
pushq   %rax/* pt_regs->orig_ax */
pushq   %rdi/* pt_regs->di */
pushq   %rsi/* pt_regs->si */
pushq   %rdx/* pt_regs->dx */
pushq   %rcx/* pt_regs->cx */
pushq   $-ENOSYS/* pt_regs->ax */
-   pushq   %r8 /* pt_regs->r8  = 0 */
-   pushq   %r8 /* pt_regs->r9  = 0 */
-   pushq   %r8 /* pt_regs->r10 = 0 */
-   pushq   %r8 /* pt_regs->r11 = 0 */
+   pushq   $0  /* pt_regs->r8  = 0 */
+   pushq   $0  /* pt_regs->r9  = 0 */
+   pushq   $0  /* pt_regs->r10 = 0 */
+   pushq   $0  /* pt_regs->r11 = 0 */
pushq   %rbx/* pt_regs->rbx */
pushq   %rbp/* pt_regs->rbp (will be overwritten) */
-   pushq   %r8 /* pt_regs->r12 = 0 */
-   pushq   %r8 /* pt_regs->r13 = 0 */
-   pushq   %r8 /* pt_regs->r14 = 0 */
-   pushq   %r8 /* pt_regs->r15 = 0 */
+   pushq   $0  /* pt_regs->r12 = 0 */
+   pushq   $0  /* pt_regs->r13 = 0 */
+   pushq   $0  /* pt_regs->r14 = 0 */
+   pushq   $0  /* pt_regs->r15 = 0 */
cld
 
/*
@@ -185,17 +184,16 @@ ENTRY(entry_SYSCALL_compat)
pushq   %rdx/* pt_regs->dx */
pushq   %rbp/* pt_regs->cx (stashed in bp) */
pushq   $-ENOSYS/* pt_regs->ax */
-   xorq%r8,%r8
-   pushq   %r8 /* pt_regs->r8  = 0 */
-   pushq   %r8 /* pt_regs->r9  = 0 */
-   pushq   %r8 /* pt_regs->r10 = 0 */
-   pushq   %r8 /* pt_regs->r11 = 0 */
+   pushq   $0  /* pt_regs->r8  = 0 */
+   pushq   $0  /* pt_regs->r9  = 0 */
+   pushq   $0  /* pt_regs->r10 = 0 */
+   pushq   $0  /* pt_regs->r11 = 0 */
pushq  

[PATCH 4.4.y 037/101] x86/speculation: Clean up various Spectre related details

2018-07-14 Thread Srivatsa S. Bhat
From: Ingo Molnar 

commit 21e433bdb95bdf3aa48226fd3d33af608437f293 upstream.

Harmonize all the Spectre messages so that a:

dmesg | grep -i spectre

... gives us most Spectre related kernel boot messages.

Also fix a few other details:

 - clarify a comment about firmware speculation control

 - s/KPTI/PTI

 - remove various line-breaks that made the code uglier

Acked-by: David Woodhouse 
Cc: Andy Lutomirski 
Cc: Arjan van de Ven 
Cc: Borislav Petkov 
Cc: Dan Williams 
Cc: Dave Hansen 
Cc: David Woodhouse 
Cc: Greg Kroah-Hartman 
Cc: Josh Poimboeuf 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Srivatsa S. Bhat 
Reviewed-by: Matt Helsley (VMware) 
Reviewed-by: Alexey Makhalov 
Reviewed-by: Bo Gan 
---

 arch/x86/kernel/cpu/bugs.c |   25 ++---
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 1968baf..fea368d 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -162,8 +162,7 @@ static enum spectre_v2_mitigation_cmd __init 
spectre_v2_parse_cmdline(void)
if (cmdline_find_option_bool(boot_command_line, "nospectre_v2"))
return SPECTRE_V2_CMD_NONE;
else {
-   ret = cmdline_find_option(boot_command_line, "spectre_v2", arg,
- sizeof(arg));
+   ret = cmdline_find_option(boot_command_line, "spectre_v2", arg, 
sizeof(arg));
if (ret < 0)
return SPECTRE_V2_CMD_AUTO;
 
@@ -184,8 +183,7 @@ static enum spectre_v2_mitigation_cmd __init 
spectre_v2_parse_cmdline(void)
 cmd == SPECTRE_V2_CMD_RETPOLINE_AMD ||
 cmd == SPECTRE_V2_CMD_RETPOLINE_GENERIC) &&
!IS_ENABLED(CONFIG_RETPOLINE)) {
-   pr_err("%s selected but not compiled in. Switching to AUTO 
select\n",
-  mitigation_options[i].option);
+   pr_err("%s selected but not compiled in. Switching to AUTO 
select\n", mitigation_options[i].option);
return SPECTRE_V2_CMD_AUTO;
}
 
@@ -255,14 +253,14 @@ static void __init spectre_v2_select_mitigation(void)
goto retpoline_auto;
break;
}
-   pr_err("kernel not compiled with retpoline; no mitigation available!");
+   pr_err("Spectre mitigation: kernel not compiled with retpoline; no 
mitigation available!");
return;
 
 retpoline_auto:
if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) {
retpoline_amd:
if (!boot_cpu_has(X86_FEATURE_LFENCE_RDTSC)) {
-   pr_err("LFENCE not serializing. Switching to generic 
retpoline\n");
+   pr_err("Spectre mitigation: LFENCE not serializing, 
switching to generic retpoline\n");
goto retpoline_generic;
}
mode = retp_compiler() ? SPECTRE_V2_RETPOLINE_AMD :
@@ -280,7 +278,7 @@ retpoline_auto:
pr_info("%s\n", spectre_v2_strings[mode]);
 
/*
-* If neither SMEP or KPTI are available, there is a risk of
+* If neither SMEP nor PTI are available, there is a risk of
 * hitting userspace addresses in the RSB after a context switch
 * from a shallow call stack to a deeper one. To prevent this fill
 * the entire RSB, even when using IBRS.
@@ -294,21 +292,20 @@ retpoline_auto:
if ((!boot_cpu_has(X86_FEATURE_KAISER) &&
 !boot_cpu_has(X86_FEATURE_SMEP)) || is_skylake_era()) {
setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
-   pr_info("Filling RSB on context switch\n");
+   pr_info("Spectre v2 mitigation: Filling RSB on context 
switch\n");
}
 
/* Initialize Indirect Branch Prediction Barrier if supported */
if (boot_cpu_has(X86_FEATURE_IBPB)) {
setup_force_cpu_cap(X86_FEATURE_USE_IBPB);
-   pr_info("Enabling Indirect Branch Prediction Barrier\n");
+   pr_info("Spectre v2 mitigation: Enabling Indirect Branch 
Prediction Barrier\n");
}
 }
 
 #undef pr_fmt
 
 #ifdef CONFIG_SYSFS
-ssize_t cpu_show_meltdown(struct device *dev,
- struct device_attribute *attr, char *buf)
+ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, 
char *buf)
 {
if (!boot_cpu_has_bug(X86_BUG_CPU_MELTDOWN))
return sprintf(buf, "Not affected\n");
@@ -317,16 +314,14 @@ ssize_t cpu_show_meltdown(struct device *dev,
return sprintf(buf, "Vulnerable\n");
 }
 
-ssize_t cpu_show_spectre_v1(struct device *dev,
-   struct

[PATCH 4.4.y 033/101] x86/asm/entry/32: Simplify pushes of zeroed pt_regs->REGs

2018-07-14 Thread Srivatsa S. Bhat
From: Denys Vlasenko 

commit 778843f934e362ed4ed734520f60a44a78a074b4 upstream

Use of a temporary R8 register here seems to be unnecessary.

"push %r8" is a two-byte insn (it needs REX prefix to specify R8),
"push $0" is two-byte too. It seems just using the latter would be
no worse.

Thus, code had an unnecessary "xorq %r8,%r8" insn.
It probably costs nothing in execution time here since we are probably
limited by store bandwidth at this point, but still.

Run-tested under QEMU: 32-bit calls still work:

 / # ./test_syscall_vdso32
 [RUN]  Executing 6-argument 32-bit syscall via VDSO
 [OK]   Arguments are preserved across syscall
 [NOTE] R11 has changed:00200ed7 - assuming clobbered by SYSRET insn
 [OK]   R8..R15 did not leak kernel data
 [RUN]  Executing 6-argument 32-bit syscall via INT 80
 [OK]   Arguments are preserved across syscall
 [OK]   R8..R15 did not leak kernel data
 [RUN]  Running tests under ptrace
 [RUN]  Executing 6-argument 32-bit syscall via VDSO
 [OK]   Arguments are preserved across syscall
 [NOTE] R11 has changed:00200ed7 - assuming clobbered by SYSRET insn
 [OK]   R8..R15 did not leak kernel data
 [RUN]  Executing 6-argument 32-bit syscall via INT 80
 [OK]   Arguments are preserved across syscall
 [OK]   R8..R15 did not leak kernel data

Signed-off-by: Denys Vlasenko 
Acked-by: Andy Lutomirski 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Frederic Weisbecker 
Cc: H. Peter Anvin 
Cc: Kees Cook 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Steven Rostedt 
Cc: Thomas Gleixner 
Cc: Will Drewry 
Cc: linux-kernel@vger.kernel.org
Link: 
http://lkml.kernel.org/r/1462201010-16846-1-git-send-email-dvlas...@redhat.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Srivatsa S. Bhat 
Reviewed-by: Matt Helsley (VMware) 
Reviewed-by: Alexey Makhalov 
Reviewed-by: Bo Gan 
---

 arch/x86/entry/entry_64_compat.S |   45 ++
 1 file changed, 21 insertions(+), 24 deletions(-)

diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index d03bf0e..e479ff8 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -79,24 +79,23 @@ ENTRY(entry_SYSENTER_compat)
ASM_CLAC/* Clear AC after saving FLAGS */
 
pushq   $__USER32_CS/* pt_regs->cs */
-   xorq%r8,%r8
-   pushq   %r8 /* pt_regs->ip = 0 (placeholder) */
+   pushq   $0  /* pt_regs->ip = 0 (placeholder) */
pushq   %rax/* pt_regs->orig_ax */
pushq   %rdi/* pt_regs->di */
pushq   %rsi/* pt_regs->si */
pushq   %rdx/* pt_regs->dx */
pushq   %rcx/* pt_regs->cx */
pushq   $-ENOSYS/* pt_regs->ax */
-   pushq   %r8 /* pt_regs->r8  = 0 */
-   pushq   %r8 /* pt_regs->r9  = 0 */
-   pushq   %r8 /* pt_regs->r10 = 0 */
-   pushq   %r8 /* pt_regs->r11 = 0 */
+   pushq   $0  /* pt_regs->r8  = 0 */
+   pushq   $0  /* pt_regs->r9  = 0 */
+   pushq   $0  /* pt_regs->r10 = 0 */
+   pushq   $0  /* pt_regs->r11 = 0 */
pushq   %rbx/* pt_regs->rbx */
pushq   %rbp/* pt_regs->rbp (will be overwritten) */
-   pushq   %r8 /* pt_regs->r12 = 0 */
-   pushq   %r8 /* pt_regs->r13 = 0 */
-   pushq   %r8 /* pt_regs->r14 = 0 */
-   pushq   %r8 /* pt_regs->r15 = 0 */
+   pushq   $0  /* pt_regs->r12 = 0 */
+   pushq   $0  /* pt_regs->r13 = 0 */
+   pushq   $0  /* pt_regs->r14 = 0 */
+   pushq   $0  /* pt_regs->r15 = 0 */
cld
 
/*
@@ -185,17 +184,16 @@ ENTRY(entry_SYSCALL_compat)
pushq   %rdx/* pt_regs->dx */
pushq   %rbp/* pt_regs->cx (stashed in bp) */
pushq   $-ENOSYS/* pt_regs->ax */
-   xorq%r8,%r8
-   pushq   %r8 /* pt_regs->r8  = 0 */
-   pushq   %r8 /* pt_regs->r9  = 0 */
-   pushq   %r8 /* pt_regs->r10 = 0 */
-   pushq   %r8 /* pt_regs->r11 = 0 */
+   pushq   $0  /* pt_regs->r8  = 0 */
+   pushq   $0  /* pt_regs->r9  = 0 */
+   pushq   $0  /* pt_regs->r10 = 0 */
+   pushq   $0  /* pt_regs->r11 = 0 */
pushq  

[PATCH 4.4.y 037/101] x86/speculation: Clean up various Spectre related details

2018-07-14 Thread Srivatsa S. Bhat
From: Ingo Molnar 

commit 21e433bdb95bdf3aa48226fd3d33af608437f293 upstream.

Harmonize all the Spectre messages so that a:

dmesg | grep -i spectre

... gives us most Spectre related kernel boot messages.

Also fix a few other details:

 - clarify a comment about firmware speculation control

 - s/KPTI/PTI

 - remove various line-breaks that made the code uglier

Acked-by: David Woodhouse 
Cc: Andy Lutomirski 
Cc: Arjan van de Ven 
Cc: Borislav Petkov 
Cc: Dan Williams 
Cc: Dave Hansen 
Cc: David Woodhouse 
Cc: Greg Kroah-Hartman 
Cc: Josh Poimboeuf 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Srivatsa S. Bhat 
Reviewed-by: Matt Helsley (VMware) 
Reviewed-by: Alexey Makhalov 
Reviewed-by: Bo Gan 
---

 arch/x86/kernel/cpu/bugs.c |   25 ++---
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 1968baf..fea368d 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -162,8 +162,7 @@ static enum spectre_v2_mitigation_cmd __init 
spectre_v2_parse_cmdline(void)
if (cmdline_find_option_bool(boot_command_line, "nospectre_v2"))
return SPECTRE_V2_CMD_NONE;
else {
-   ret = cmdline_find_option(boot_command_line, "spectre_v2", arg,
- sizeof(arg));
+   ret = cmdline_find_option(boot_command_line, "spectre_v2", arg, 
sizeof(arg));
if (ret < 0)
return SPECTRE_V2_CMD_AUTO;
 
@@ -184,8 +183,7 @@ static enum spectre_v2_mitigation_cmd __init 
spectre_v2_parse_cmdline(void)
 cmd == SPECTRE_V2_CMD_RETPOLINE_AMD ||
 cmd == SPECTRE_V2_CMD_RETPOLINE_GENERIC) &&
!IS_ENABLED(CONFIG_RETPOLINE)) {
-   pr_err("%s selected but not compiled in. Switching to AUTO 
select\n",
-  mitigation_options[i].option);
+   pr_err("%s selected but not compiled in. Switching to AUTO 
select\n", mitigation_options[i].option);
return SPECTRE_V2_CMD_AUTO;
}
 
@@ -255,14 +253,14 @@ static void __init spectre_v2_select_mitigation(void)
goto retpoline_auto;
break;
}
-   pr_err("kernel not compiled with retpoline; no mitigation available!");
+   pr_err("Spectre mitigation: kernel not compiled with retpoline; no 
mitigation available!");
return;
 
 retpoline_auto:
if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) {
retpoline_amd:
if (!boot_cpu_has(X86_FEATURE_LFENCE_RDTSC)) {
-   pr_err("LFENCE not serializing. Switching to generic 
retpoline\n");
+   pr_err("Spectre mitigation: LFENCE not serializing, 
switching to generic retpoline\n");
goto retpoline_generic;
}
mode = retp_compiler() ? SPECTRE_V2_RETPOLINE_AMD :
@@ -280,7 +278,7 @@ retpoline_auto:
pr_info("%s\n", spectre_v2_strings[mode]);
 
/*
-* If neither SMEP or KPTI are available, there is a risk of
+* If neither SMEP nor PTI are available, there is a risk of
 * hitting userspace addresses in the RSB after a context switch
 * from a shallow call stack to a deeper one. To prevent this fill
 * the entire RSB, even when using IBRS.
@@ -294,21 +292,20 @@ retpoline_auto:
if ((!boot_cpu_has(X86_FEATURE_KAISER) &&
 !boot_cpu_has(X86_FEATURE_SMEP)) || is_skylake_era()) {
setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
-   pr_info("Filling RSB on context switch\n");
+   pr_info("Spectre v2 mitigation: Filling RSB on context 
switch\n");
}
 
/* Initialize Indirect Branch Prediction Barrier if supported */
if (boot_cpu_has(X86_FEATURE_IBPB)) {
setup_force_cpu_cap(X86_FEATURE_USE_IBPB);
-   pr_info("Enabling Indirect Branch Prediction Barrier\n");
+   pr_info("Spectre v2 mitigation: Enabling Indirect Branch 
Prediction Barrier\n");
}
 }
 
 #undef pr_fmt
 
 #ifdef CONFIG_SYSFS
-ssize_t cpu_show_meltdown(struct device *dev,
- struct device_attribute *attr, char *buf)
+ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, 
char *buf)
 {
if (!boot_cpu_has_bug(X86_BUG_CPU_MELTDOWN))
return sprintf(buf, "Not affected\n");
@@ -317,16 +314,14 @@ ssize_t cpu_show_meltdown(struct device *dev,
return sprintf(buf, "Vulnerable\n");
 }
 
-ssize_t cpu_show_spectre_v1(struct device *dev,
-   struct

Re: [PATCH 1/5] random: fix crng_ready() test

2018-05-16 Thread Srivatsa S. Bhat
On 4/13/18 10:00 AM, Theodore Y. Ts'o wrote:
> On Fri, Apr 13, 2018 at 03:05:01PM +0200, Stephan Mueller wrote:
>>
>> What I would like to point out that more and more folks change to 
>> getrandom(2). As this call will now unblock much later in the boot cycle, 
>> these systems see a significant departure from the current system behavior.
>>
>> E.g. an sshd using getrandom(2) would be ready shortly after the boot 
>> finishes 
>> as of now. Now it can be a matter minutes before it responds. Thus, is such 
>> change in the kernel behavior something for stable?

[...]

> I was a little worried that on VM's this could end up causing things
> to block for a long time, but an experiment on a GCE VM shows that
> isn't a problem:
> 
> [0.00] Linux version 4.16.0-rc3-ext4-9-gf6b302ebca85 (tytso@cwcc) 
> (gcc version 7.3.0 (Debian 7.3.0-15)) #16 SMP Thu Apr 12 16:57:17 EDT 2018
> [1.282220] random: fast init done
> [3.987092] random: crng init done
> [4.376787] EXT4-fs (sda1): re-mounted. Opts: (null)
> 
> There are some desktops where the "crng_init done" report doesn't
> happen until 45-90 seconds into the boot.  I don't think I've seen
> reports where it takes _minutes_ however.  Can you give me some
> examples of such cases?

On a Photon OS VM running on VMware ESXi, this patch causes a boot speed
regression of 5 minutes :-( [ The VM doesn't have haveged or rng-tools
(rngd) installed. ]

[1.420246] EXT4-fs (sda2): re-mounted. Opts: barrier,noacl,data=ordered
[1.469722] tsc: Refined TSC clocksource calibration: 1900.002 MHz
[1.470707] clocksource: tsc: mask: 0x max_cycles: 
0x36c65c1a9e1, max_idle_ns: 881590695311 ns
[1.474249] clocksource: Switched to clocksource tsc
[1.584427] systemd-journald[216]: Received request to flush runtime journal 
from PID 1
[  346.620718] random: crng init done

Interestingly, the boot delay is exacerbated on VMs with large amounts
of RAM. For example, the delay is not so noticeable (< 30 seconds) on a
VM with 2GB memory, but goes up to 5 minutes on an 8GB VM.

Also, cloud-init-local.service seems to be the one blocking for entropy
here. systemd-analyze critical-chain shows:

The time after the unit is active or started is printed after the "@" character.
The time the unit takes to start is printed after the "+" character.

multi-user.target @6min 1.283s
└─vmtoolsd.service @6min 1.282s
  └─cloud-final.service @6min 366ms +914ms
└─cloud-config.service @5min 59.174s +1.190s
  └─cloud-config.target @5min 59.172s
└─cloud-init.service @5min 47.423s +11.744s
  └─systemd-networkd-wait-online.service @5min 45.999s +1.420s
└─systemd-networkd.service @5min 45.975s +21ms
  └─network-pre.target @5min 45.973s
└─cloud-init-local.service @241ms +5min 45.687s
  └─systemd-remount-fs.service @222ms +13ms
└─systemd-fsck-root.service @193ms +26ms
  └─systemd-journald.socket @188ms
└─-.mount @151ms
  └─system.slice @161ms
└─-.slice @151ms

It would be great if this CVE can be fixed somehow without causing boot speed
to spike from ~20 seconds to 5 minutes, as that makes the system pretty much
unusable. I can workaround this by installing haveged, but ideally an in-kernel
fix would be better. If you need any other info about my setup or if you have
a patch that I can test, please let me know!

Thank you very much!

Regards,
Srivatsa
VMware Photon OS


Re: [PATCH 1/5] random: fix crng_ready() test

2018-05-16 Thread Srivatsa S. Bhat
On 4/13/18 10:00 AM, Theodore Y. Ts'o wrote:
> On Fri, Apr 13, 2018 at 03:05:01PM +0200, Stephan Mueller wrote:
>>
>> What I would like to point out that more and more folks change to 
>> getrandom(2). As this call will now unblock much later in the boot cycle, 
>> these systems see a significant departure from the current system behavior.
>>
>> E.g. an sshd using getrandom(2) would be ready shortly after the boot 
>> finishes 
>> as of now. Now it can be a matter minutes before it responds. Thus, is such 
>> change in the kernel behavior something for stable?

[...]

> I was a little worried that on VM's this could end up causing things
> to block for a long time, but an experiment on a GCE VM shows that
> isn't a problem:
> 
> [0.00] Linux version 4.16.0-rc3-ext4-9-gf6b302ebca85 (tytso@cwcc) 
> (gcc version 7.3.0 (Debian 7.3.0-15)) #16 SMP Thu Apr 12 16:57:17 EDT 2018
> [1.282220] random: fast init done
> [3.987092] random: crng init done
> [4.376787] EXT4-fs (sda1): re-mounted. Opts: (null)
> 
> There are some desktops where the "crng_init done" report doesn't
> happen until 45-90 seconds into the boot.  I don't think I've seen
> reports where it takes _minutes_ however.  Can you give me some
> examples of such cases?

On a Photon OS VM running on VMware ESXi, this patch causes a boot speed
regression of 5 minutes :-( [ The VM doesn't have haveged or rng-tools
(rngd) installed. ]

[1.420246] EXT4-fs (sda2): re-mounted. Opts: barrier,noacl,data=ordered
[1.469722] tsc: Refined TSC clocksource calibration: 1900.002 MHz
[1.470707] clocksource: tsc: mask: 0x max_cycles: 
0x36c65c1a9e1, max_idle_ns: 881590695311 ns
[1.474249] clocksource: Switched to clocksource tsc
[1.584427] systemd-journald[216]: Received request to flush runtime journal 
from PID 1
[  346.620718] random: crng init done

Interestingly, the boot delay is exacerbated on VMs with large amounts
of RAM. For example, the delay is not so noticeable (< 30 seconds) on a
VM with 2GB memory, but goes up to 5 minutes on an 8GB VM.

Also, cloud-init-local.service seems to be the one blocking for entropy
here. systemd-analyze critical-chain shows:

The time after the unit is active or started is printed after the "@" character.
The time the unit takes to start is printed after the "+" character.

multi-user.target @6min 1.283s
└─vmtoolsd.service @6min 1.282s
  └─cloud-final.service @6min 366ms +914ms
└─cloud-config.service @5min 59.174s +1.190s
  └─cloud-config.target @5min 59.172s
└─cloud-init.service @5min 47.423s +11.744s
  └─systemd-networkd-wait-online.service @5min 45.999s +1.420s
└─systemd-networkd.service @5min 45.975s +21ms
  └─network-pre.target @5min 45.973s
└─cloud-init-local.service @241ms +5min 45.687s
  └─systemd-remount-fs.service @222ms +13ms
└─systemd-fsck-root.service @193ms +26ms
  └─systemd-journald.socket @188ms
└─-.mount @151ms
  └─system.slice @161ms
└─-.slice @151ms

It would be great if this CVE can be fixed somehow without causing boot speed
to spike from ~20 seconds to 5 minutes, as that makes the system pretty much
unusable. I can workaround this by installing haveged, but ideally an in-kernel
fix would be better. If you need any other info about my setup or if you have
a patch that I can test, please let me know!

Thank you very much!

Regards,
Srivatsa
VMware Photon OS


Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed

2018-03-21 Thread Srivatsa S. Bhat
On 3/21/18 10:12 PM, Srivatsa S. Bhat wrote:
> On 3/21/18 7:02 PM, Steve French wrote:
>> Found a patch which solves the dependency issue.  In my testing (on
>> 4.9, with Windows 2016, and also to Samba) as Pavel suggested this
>> appears to fix the problem, but I will let Srivatsa confirm that it
>> also fixes it for him.  The two attached patches for 4.9 should work.
>>
> 
> Indeed, those two patches fix the problem for me on 4.9. Thanks a lot
> Steve, Pavel and Aurelien for all your efforts in fixing this!
> 
> I was also interested in getting this fixed on 4.4, so I modified the
> patches to apply on 4.4.88 and verified that they fix the mount

I meant to say 4.4.122 there (the latest stable 4.4 version at the moment).

Regards,
Srivatsa
VMware Photon OS


Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed

2018-03-21 Thread Srivatsa S. Bhat
On 3/21/18 10:12 PM, Srivatsa S. Bhat wrote:
> On 3/21/18 7:02 PM, Steve French wrote:
>> Found a patch which solves the dependency issue.  In my testing (on
>> 4.9, with Windows 2016, and also to Samba) as Pavel suggested this
>> appears to fix the problem, but I will let Srivatsa confirm that it
>> also fixes it for him.  The two attached patches for 4.9 should work.
>>
> 
> Indeed, those two patches fix the problem for me on 4.9. Thanks a lot
> Steve, Pavel and Aurelien for all your efforts in fixing this!
> 
> I was also interested in getting this fixed on 4.4, so I modified the
> patches to apply on 4.4.88 and verified that they fix the mount

I meant to say 4.4.122 there (the latest stable 4.4 version at the moment).

Regards,
Srivatsa
VMware Photon OS


Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed

2018-03-21 Thread Srivatsa S. Bhat
On 3/21/18 7:02 PM, Steve French wrote:
> Found a patch which solves the dependency issue.  In my testing (on
> 4.9, with Windows 2016, and also to Samba) as Pavel suggested this
> appears to fix the problem, but I will let Srivatsa confirm that it
> also fixes it for him.  The two attached patches for 4.9 should work.
> 

Indeed, those two patches fix the problem for me on 4.9. Thanks a lot
Steve, Pavel and Aurelien for all your efforts in fixing this!

I was also interested in getting this fixed on 4.4, so I modified the
patches to apply on 4.4.88 and verified that they fix the mount
failure. I have attached my patches for 4.4 with this mail.

Steve, Pavel, could you kindly double-check the second patch for 4.4,
especially around the keygen_exit error path?

Thank you very much!

Regards,
Srivatsa
VMware Photon OS
From a01a7dfb60e2d5421a487a7b81fd8a1bf72d96d4 Mon Sep 17 00:00:00 2001
From: Steve French <smfre...@gmail.com>
Date: Sun, 11 Mar 2018 20:00:27 -0700
Subject: [PATCH 1/2] SMB3: Validate negotiate request must always be signed

commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd upstream.

According to MS-SMB2 3.2.55 validate_negotiate request must
always be signed. Some Windows can fail the request if you send it unsigned

See kernel bugzilla bug 197311

[ Fixed up for kernel version 4.4 ]

CC: Stable <sta...@vger.kernel.org>
Acked-by: Ronnie Sahlberg 
Signed-off-by: Steve French <smfre...@gmail.com>
Signed-off-by: Srivatsa S. Bhat <sriva...@csail.mit.edu>
---
 fs/cifs/smb2pdu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
index 84614a5..6dae5b8 100644
--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -1558,6 +1558,9 @@ SMB2_ioctl(const unsigned int xid, struct cifs_tcon 
*tcon, u64 persistent_fid,
} else
iov[0].iov_len = get_rfc1002_length(req) + 4;
 
+   /* validate negotiate request must be signed - see MS-SMB2 3.2.5.5 */
+   if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO)
+   req->hdr.Flags |= SMB2_FLAGS_SIGNED;
 
rc = SendReceive2(xid, ses, iov, num_iovecs, _buftype, 0);
rsp = (struct smb2_ioctl_rsp *)iov[0].iov_base;
-- 
2.7.4

From d0178d8f096b29a88914787274bdc8ee8334ab07 Mon Sep 17 00:00:00 2001
From: Pavel Shilovsky <pshi...@microsoft.com>
Date: Mon, 7 Nov 2016 18:20:50 -0800
Subject: [PATCH 2/2] CIFS: Enable encryption during session setup phase

commit cabfb3680f78981d26c078a26e5c748531257ebb upstream.

In order to allow encryption on SMB connection we need to exchange
a session key and generate encryption and decryption keys.

[ Fixed up for kernel version 4.4 ]

Signed-off-by: Pavel Shilovsky <pshi...@microsoft.com>
Signed-off-by: Srivatsa S. Bhat <sriva...@csail.mit.edu>
---
 fs/cifs/sess.c| 22 ++
 fs/cifs/smb2pdu.c |  8 +---
 2 files changed, 11 insertions(+), 19 deletions(-)

diff --git a/fs/cifs/sess.c b/fs/cifs/sess.c
index e88ffe1..a035d1a 100644
--- a/fs/cifs/sess.c
+++ b/fs/cifs/sess.c
@@ -344,13 +344,12 @@ void build_ntlmssp_negotiate_blob(unsigned char *pbuffer,
/* BB is NTLMV2 session security format easier to use here? */
flags = NTLMSSP_NEGOTIATE_56 |  NTLMSSP_REQUEST_TARGET |
NTLMSSP_NEGOTIATE_128 | NTLMSSP_NEGOTIATE_UNICODE |
-   NTLMSSP_NEGOTIATE_NTLM | NTLMSSP_NEGOTIATE_EXTENDED_SEC;
-   if (ses->server->sign) {
+   NTLMSSP_NEGOTIATE_NTLM | NTLMSSP_NEGOTIATE_EXTENDED_SEC |
+   NTLMSSP_NEGOTIATE_SEAL;
+   if (ses->server->sign)
flags |= NTLMSSP_NEGOTIATE_SIGN;
-   if (!ses->server->session_estab ||
-   ses->ntlmssp->sesskey_per_smbsess)
-   flags |= NTLMSSP_NEGOTIATE_KEY_XCH;
-   }
+   if (!ses->server->session_estab || ses->ntlmssp->sesskey_per_smbsess)
+   flags |= NTLMSSP_NEGOTIATE_KEY_XCH;
 
sec_blob->NegotiateFlags = cpu_to_le32(flags);
 
@@ -407,13 +406,12 @@ int build_ntlmssp_auth_blob(unsigned char **pbuffer,
flags = NTLMSSP_NEGOTIATE_56 |
NTLMSSP_REQUEST_TARGET | NTLMSSP_NEGOTIATE_TARGET_INFO |
NTLMSSP_NEGOTIATE_128 | NTLMSSP_NEGOTIATE_UNICODE |
-   NTLMSSP_NEGOTIATE_NTLM | NTLMSSP_NEGOTIATE_EXTENDED_SEC;
-   if (ses->server->sign) {
+   NTLMSSP_NEGOTIATE_NTLM | NTLMSSP_NEGOTIATE_EXTENDED_SEC |
+   NTLMSSP_NEGOTIATE_SEAL;
+   if (ses->server->sign)
flags |= NTLMSSP_NEGOTIATE_SIGN;
-   if (!ses->server->session_estab ||
-   ses->ntlmssp->sesskey_per_smbsess)
-   flags |= NTLMSSP_NEGOTIATE_KEY_XCH;
-   }
+   if (!ses->server->session_estab || ses->ntlmssp->sesskey_per_smbsess)
+   flags |= NTLMSSP_NEGOTIATE_KEY_XCH;
 
tmp = *pbuffer + 

Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed

2018-03-21 Thread Srivatsa S. Bhat
On 3/21/18 7:02 PM, Steve French wrote:
> Found a patch which solves the dependency issue.  In my testing (on
> 4.9, with Windows 2016, and also to Samba) as Pavel suggested this
> appears to fix the problem, but I will let Srivatsa confirm that it
> also fixes it for him.  The two attached patches for 4.9 should work.
> 

Indeed, those two patches fix the problem for me on 4.9. Thanks a lot
Steve, Pavel and Aurelien for all your efforts in fixing this!

I was also interested in getting this fixed on 4.4, so I modified the
patches to apply on 4.4.88 and verified that they fix the mount
failure. I have attached my patches for 4.4 with this mail.

Steve, Pavel, could you kindly double-check the second patch for 4.4,
especially around the keygen_exit error path?

Thank you very much!

Regards,
Srivatsa
VMware Photon OS
From a01a7dfb60e2d5421a487a7b81fd8a1bf72d96d4 Mon Sep 17 00:00:00 2001
From: Steve French 
Date: Sun, 11 Mar 2018 20:00:27 -0700
Subject: [PATCH 1/2] SMB3: Validate negotiate request must always be signed

commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd upstream.

According to MS-SMB2 3.2.55 validate_negotiate request must
always be signed. Some Windows can fail the request if you send it unsigned

See kernel bugzilla bug 197311

[ Fixed up for kernel version 4.4 ]

CC: Stable 
Acked-by: Ronnie Sahlberg 
Signed-off-by: Steve French 
Signed-off-by: Srivatsa S. Bhat 
---
 fs/cifs/smb2pdu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
index 84614a5..6dae5b8 100644
--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -1558,6 +1558,9 @@ SMB2_ioctl(const unsigned int xid, struct cifs_tcon 
*tcon, u64 persistent_fid,
} else
iov[0].iov_len = get_rfc1002_length(req) + 4;
 
+   /* validate negotiate request must be signed - see MS-SMB2 3.2.5.5 */
+   if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO)
+   req->hdr.Flags |= SMB2_FLAGS_SIGNED;
 
rc = SendReceive2(xid, ses, iov, num_iovecs, _buftype, 0);
rsp = (struct smb2_ioctl_rsp *)iov[0].iov_base;
-- 
2.7.4

From d0178d8f096b29a88914787274bdc8ee8334ab07 Mon Sep 17 00:00:00 2001
From: Pavel Shilovsky 
Date: Mon, 7 Nov 2016 18:20:50 -0800
Subject: [PATCH 2/2] CIFS: Enable encryption during session setup phase

commit cabfb3680f78981d26c078a26e5c748531257ebb upstream.

In order to allow encryption on SMB connection we need to exchange
a session key and generate encryption and decryption keys.

[ Fixed up for kernel version 4.4 ]

Signed-off-by: Pavel Shilovsky 
Signed-off-by: Srivatsa S. Bhat 
---
 fs/cifs/sess.c| 22 ++
 fs/cifs/smb2pdu.c |  8 +---
 2 files changed, 11 insertions(+), 19 deletions(-)

diff --git a/fs/cifs/sess.c b/fs/cifs/sess.c
index e88ffe1..a035d1a 100644
--- a/fs/cifs/sess.c
+++ b/fs/cifs/sess.c
@@ -344,13 +344,12 @@ void build_ntlmssp_negotiate_blob(unsigned char *pbuffer,
/* BB is NTLMV2 session security format easier to use here? */
flags = NTLMSSP_NEGOTIATE_56 |  NTLMSSP_REQUEST_TARGET |
NTLMSSP_NEGOTIATE_128 | NTLMSSP_NEGOTIATE_UNICODE |
-   NTLMSSP_NEGOTIATE_NTLM | NTLMSSP_NEGOTIATE_EXTENDED_SEC;
-   if (ses->server->sign) {
+   NTLMSSP_NEGOTIATE_NTLM | NTLMSSP_NEGOTIATE_EXTENDED_SEC |
+   NTLMSSP_NEGOTIATE_SEAL;
+   if (ses->server->sign)
flags |= NTLMSSP_NEGOTIATE_SIGN;
-   if (!ses->server->session_estab ||
-   ses->ntlmssp->sesskey_per_smbsess)
-   flags |= NTLMSSP_NEGOTIATE_KEY_XCH;
-   }
+   if (!ses->server->session_estab || ses->ntlmssp->sesskey_per_smbsess)
+   flags |= NTLMSSP_NEGOTIATE_KEY_XCH;
 
sec_blob->NegotiateFlags = cpu_to_le32(flags);
 
@@ -407,13 +406,12 @@ int build_ntlmssp_auth_blob(unsigned char **pbuffer,
flags = NTLMSSP_NEGOTIATE_56 |
NTLMSSP_REQUEST_TARGET | NTLMSSP_NEGOTIATE_TARGET_INFO |
NTLMSSP_NEGOTIATE_128 | NTLMSSP_NEGOTIATE_UNICODE |
-   NTLMSSP_NEGOTIATE_NTLM | NTLMSSP_NEGOTIATE_EXTENDED_SEC;
-   if (ses->server->sign) {
+   NTLMSSP_NEGOTIATE_NTLM | NTLMSSP_NEGOTIATE_EXTENDED_SEC |
+   NTLMSSP_NEGOTIATE_SEAL;
+   if (ses->server->sign)
flags |= NTLMSSP_NEGOTIATE_SIGN;
-   if (!ses->server->session_estab ||
-   ses->ntlmssp->sesskey_per_smbsess)
-   flags |= NTLMSSP_NEGOTIATE_KEY_XCH;
-   }
+   if (!ses->server->session_estab || ses->ntlmssp->sesskey_per_smbsess)
+   flags |= NTLMSSP_NEGOTIATE_KEY_XCH;
 
tmp = *pbuffer + sizeof(AUTHENTICATE_MESSAGE);
sec_blob->NegotiateFlags = cpu_to_le32(flags);
diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
index 6dae5b8..33b1bc2 100644
--- a/fs/cifs/smb2pdu.c
+++ b/fs/

Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed

2018-03-01 Thread Srivatsa S. Bhat
On 3/1/18 12:12 PM, Steve French wrote:
> So far I haven't been able to reproduce this on the current 4.9 stable
> tree with vers=3.0 or with default (vers=1.0 for these older kernels).
> 

Maybe the problem also depends on the particular version of Windows
that hosts the SMB shares? I'm using Windows Server 2016 (Version
1607, OS Build 14393.693). With vers=3.0, the issue is reproducible
every time, but vers=1.0 works fine.

Regards,
Srivatsa

> On Tue, Feb 27, 2018 at 11:56 AM, Steve French <smfre...@gmail.com> wrote:
>> This shouldn't be too hard to figure out if willing to backport a
>> slightly larger set of fixes to the older stable, but I don't have a
>> system running 4.9 stable.
>>
>> Is this the correct stable tree branch?
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/?h=linux-4.9.y
>>
>> On Tue, Feb 27, 2018 at 11:45 AM, Srivatsa S. Bhat
>> <sriva...@csail.mit.edu> wrote:
>>> On 2/27/18 4:40 AM, Greg Kroah-Hartman wrote:
>>>> On Tue, Feb 27, 2018 at 01:22:31AM -0800, Srivatsa S. Bhat wrote:
>>>>> On 2/27/18 12:54 AM, Greg Kroah-Hartman wrote:
>>>>>> On Mon, Feb 26, 2018 at 07:44:28PM -0800, Srivatsa S. Bhat wrote:
>>>>>>> On 1/3/18 6:15 PM, Srivatsa S. Bhat wrote:
>>>>>>>> On 11/1/17 8:18 AM, Greg Kroah-Hartman wrote:
>>>>>>>>> On Tue, Oct 31, 2017 at 03:02:11PM +0200, Thomas Backlund wrote:
>>>>>>>>>> Den 31.10.2017 kl. 11:55, skrev Greg Kroah-Hartman:
>>>>>>>>>>> 4.13-stable review patch.  If anyone has any objections, please let 
>>>>>>>>>>> me know.
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> From: Steve French <smfre...@gmail.com>
>>>>>>>>>>>
>>>>>>>>>>> commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd upstream.
>>>>>>>>>>>
>>>>>>>>>>> According to MS-SMB2 3.2.55 validate_negotiate request must
>>>>>>>>>>> always be signed. Some Windows can fail the request if you send it 
>>>>>>>>>>> unsigned
>>>>>>>>>>>
>>>>>>>>>>> See kernel bugzilla bug 197311
>>>>>>>>>>>
>>>>>>>>>>> Acked-by: Ronnie Sahlberg 
>>>>>>>>>>> Signed-off-by: Steve French <smfre...@gmail.com>
>>>>>>>>>>> Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
>>>>>>>>>>>
>>>>>>>>>>> ---
>>>>>>>>>>>   fs/cifs/smb2pdu.c |3 +++
>>>>>>>>>>>   1 file changed, 3 insertions(+)
>>>>>>>>>>>
>>>>>>>>>>> --- a/fs/cifs/smb2pdu.c
>>>>>>>>>>> +++ b/fs/cifs/smb2pdu.c
>>>>>>>>>>> @@ -1963,6 +1963,9 @@ SMB2_ioctl(const unsigned int xid, struc
>>>>>>>>>>>} else
>>>>>>>>>>>iov[0].iov_len = get_rfc1002_length(req) + 4;
>>>>>>>>>>> +  /* validate negotiate request must be signed - see MS-SMB2 
>>>>>>>>>>> 3.2.5.5 */
>>>>>>>>>>> +  if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO)
>>>>>>>>>>> +  req->hdr.sync_hdr.Flags |= SMB2_FLAGS_SIGNED;
>>>>>>>>>>>rc = SendReceive2(xid, ses, iov, n_iov, _buftype, 
>>>>>>>>>>> flags, _iov);
>>>>>>>>>>>cifs_small_buf_release(req);
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This one needs to be backported to all stable kernels as the commit 
>>>>>>>>>> that
>>>>>>>>>> introduced the regression:
>>>>>>>>>> '
>>>>>>>>>> 0603c96f3af50e2f9299fa410c224ab1d465e0f9
>>>>>>>>>> SMB: Validate negotiate (to protect against downgrade) even if 
>>>>>>>>>> signing off
>>

Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed

2018-03-01 Thread Srivatsa S. Bhat
On 3/1/18 12:12 PM, Steve French wrote:
> So far I haven't been able to reproduce this on the current 4.9 stable
> tree with vers=3.0 or with default (vers=1.0 for these older kernels).
> 

Maybe the problem also depends on the particular version of Windows
that hosts the SMB shares? I'm using Windows Server 2016 (Version
1607, OS Build 14393.693). With vers=3.0, the issue is reproducible
every time, but vers=1.0 works fine.

Regards,
Srivatsa

> On Tue, Feb 27, 2018 at 11:56 AM, Steve French  wrote:
>> This shouldn't be too hard to figure out if willing to backport a
>> slightly larger set of fixes to the older stable, but I don't have a
>> system running 4.9 stable.
>>
>> Is this the correct stable tree branch?
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/?h=linux-4.9.y
>>
>> On Tue, Feb 27, 2018 at 11:45 AM, Srivatsa S. Bhat
>>  wrote:
>>> On 2/27/18 4:40 AM, Greg Kroah-Hartman wrote:
>>>> On Tue, Feb 27, 2018 at 01:22:31AM -0800, Srivatsa S. Bhat wrote:
>>>>> On 2/27/18 12:54 AM, Greg Kroah-Hartman wrote:
>>>>>> On Mon, Feb 26, 2018 at 07:44:28PM -0800, Srivatsa S. Bhat wrote:
>>>>>>> On 1/3/18 6:15 PM, Srivatsa S. Bhat wrote:
>>>>>>>> On 11/1/17 8:18 AM, Greg Kroah-Hartman wrote:
>>>>>>>>> On Tue, Oct 31, 2017 at 03:02:11PM +0200, Thomas Backlund wrote:
>>>>>>>>>> Den 31.10.2017 kl. 11:55, skrev Greg Kroah-Hartman:
>>>>>>>>>>> 4.13-stable review patch.  If anyone has any objections, please let 
>>>>>>>>>>> me know.
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> From: Steve French 
>>>>>>>>>>>
>>>>>>>>>>> commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd upstream.
>>>>>>>>>>>
>>>>>>>>>>> According to MS-SMB2 3.2.55 validate_negotiate request must
>>>>>>>>>>> always be signed. Some Windows can fail the request if you send it 
>>>>>>>>>>> unsigned
>>>>>>>>>>>
>>>>>>>>>>> See kernel bugzilla bug 197311
>>>>>>>>>>>
>>>>>>>>>>> Acked-by: Ronnie Sahlberg 
>>>>>>>>>>> Signed-off-by: Steve French 
>>>>>>>>>>> Signed-off-by: Greg Kroah-Hartman 
>>>>>>>>>>>
>>>>>>>>>>> ---
>>>>>>>>>>>   fs/cifs/smb2pdu.c |3 +++
>>>>>>>>>>>   1 file changed, 3 insertions(+)
>>>>>>>>>>>
>>>>>>>>>>> --- a/fs/cifs/smb2pdu.c
>>>>>>>>>>> +++ b/fs/cifs/smb2pdu.c
>>>>>>>>>>> @@ -1963,6 +1963,9 @@ SMB2_ioctl(const unsigned int xid, struc
>>>>>>>>>>>} else
>>>>>>>>>>>iov[0].iov_len = get_rfc1002_length(req) + 4;
>>>>>>>>>>> +  /* validate negotiate request must be signed - see MS-SMB2 
>>>>>>>>>>> 3.2.5.5 */
>>>>>>>>>>> +  if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO)
>>>>>>>>>>> +  req->hdr.sync_hdr.Flags |= SMB2_FLAGS_SIGNED;
>>>>>>>>>>>rc = SendReceive2(xid, ses, iov, n_iov, _buftype, 
>>>>>>>>>>> flags, _iov);
>>>>>>>>>>>cifs_small_buf_release(req);
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This one needs to be backported to all stable kernels as the commit 
>>>>>>>>>> that
>>>>>>>>>> introduced the regression:
>>>>>>>>>> '
>>>>>>>>>> 0603c96f3af50e2f9299fa410c224ab1d465e0f9
>>>>>>>>>> SMB: Validate negotiate (to protect against downgrade) even if 
>>>>>>>>>> signing off
>>>>>>>>>>
>>>>>>>>>> is backported in stable trees as of: 4.9.53, 4.4.90, 3.18.73
>>

Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed

2018-02-27 Thread Srivatsa S. Bhat
On 2/27/18 9:56 AM, Steve French wrote:
> This shouldn't be too hard to figure out if willing to backport a
> slightly larger set of fixes to the older stable, but I don't have a
> system running 4.9 stable.
> 

If you have the proposed patches that apply on 4.9, I'd be happy to
try them out!

[ I would have offered to backport the patches myself, but actually I
already tried doing that with a larger set of patches from mainline
(picking those commits between the regression and the fix that seemed
relevant), but I felt quite out-of-depth trying to adapt them to 4.9
and 4.4, as I'm not that familiar with the internals of SMB/CIFS. ]

> Is this the correct stable tree branch?
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/?h=linux-4.9.y
> 

Yep!

Regards,
Srivatsa

> On Tue, Feb 27, 2018 at 11:45 AM, Srivatsa S. Bhat
> <sriva...@csail.mit.edu> wrote:
>> On 2/27/18 4:40 AM, Greg Kroah-Hartman wrote:
>>> On Tue, Feb 27, 2018 at 01:22:31AM -0800, Srivatsa S. Bhat wrote:
>>>> On 2/27/18 12:54 AM, Greg Kroah-Hartman wrote:
>>>>> On Mon, Feb 26, 2018 at 07:44:28PM -0800, Srivatsa S. Bhat wrote:
>>>>>> On 1/3/18 6:15 PM, Srivatsa S. Bhat wrote:
>>>>>>> On 11/1/17 8:18 AM, Greg Kroah-Hartman wrote:
>>>>>>>> On Tue, Oct 31, 2017 at 03:02:11PM +0200, Thomas Backlund wrote:
>>>>>>>>> Den 31.10.2017 kl. 11:55, skrev Greg Kroah-Hartman:
>>>>>>>>>> 4.13-stable review patch.  If anyone has any objections, please let 
>>>>>>>>>> me know.
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> From: Steve French <smfre...@gmail.com>
>>>>>>>>>>
>>>>>>>>>> commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd upstream.
>>>>>>>>>>
>>>>>>>>>> According to MS-SMB2 3.2.55 validate_negotiate request must
>>>>>>>>>> always be signed. Some Windows can fail the request if you send it 
>>>>>>>>>> unsigned
>>>>>>>>>>
>>>>>>>>>> See kernel bugzilla bug 197311
>>>>>>>>>>
>>>>>>>>>> Acked-by: Ronnie Sahlberg 
>>>>>>>>>> Signed-off-by: Steve French <smfre...@gmail.com>
>>>>>>>>>> Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
>>>>>>>>>>
>>>>>>>>>> ---
>>>>>>>>>>   fs/cifs/smb2pdu.c |3 +++
>>>>>>>>>>   1 file changed, 3 insertions(+)
>>>>>>>>>>
>>>>>>>>>> --- a/fs/cifs/smb2pdu.c
>>>>>>>>>> +++ b/fs/cifs/smb2pdu.c
>>>>>>>>>> @@ -1963,6 +1963,9 @@ SMB2_ioctl(const unsigned int xid, struc
>>>>>>>>>>} else
>>>>>>>>>>iov[0].iov_len = get_rfc1002_length(req) + 4;
>>>>>>>>>> +  /* validate negotiate request must be signed - see MS-SMB2 
>>>>>>>>>> 3.2.5.5 */
>>>>>>>>>> +  if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO)
>>>>>>>>>> +  req->hdr.sync_hdr.Flags |= SMB2_FLAGS_SIGNED;
>>>>>>>>>>rc = SendReceive2(xid, ses, iov, n_iov, _buftype, flags, 
>>>>>>>>>> _iov);
>>>>>>>>>>cifs_small_buf_release(req);
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This one needs to be backported to all stable kernels as the commit 
>>>>>>>>> that
>>>>>>>>> introduced the regression:
>>>>>>>>> '
>>>>>>>>> 0603c96f3af50e2f9299fa410c224ab1d465e0f9
>>>>>>>>> SMB: Validate negotiate (to protect against downgrade) even if 
>>>>>>>>> signing off
>>>>>>>>>
>>>>>>>>> is backported in stable trees as of: 4.9.53, 4.4.90, 3.18.73
>>>>>>>>
>>>>>>>> Oh wait, it breaks the builds on older kernels, that's why I didn't
>>>>>>>>

Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed

2018-02-27 Thread Srivatsa S. Bhat
On 2/27/18 9:56 AM, Steve French wrote:
> This shouldn't be too hard to figure out if willing to backport a
> slightly larger set of fixes to the older stable, but I don't have a
> system running 4.9 stable.
> 

If you have the proposed patches that apply on 4.9, I'd be happy to
try them out!

[ I would have offered to backport the patches myself, but actually I
already tried doing that with a larger set of patches from mainline
(picking those commits between the regression and the fix that seemed
relevant), but I felt quite out-of-depth trying to adapt them to 4.9
and 4.4, as I'm not that familiar with the internals of SMB/CIFS. ]

> Is this the correct stable tree branch?
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/?h=linux-4.9.y
> 

Yep!

Regards,
Srivatsa

> On Tue, Feb 27, 2018 at 11:45 AM, Srivatsa S. Bhat
>  wrote:
>> On 2/27/18 4:40 AM, Greg Kroah-Hartman wrote:
>>> On Tue, Feb 27, 2018 at 01:22:31AM -0800, Srivatsa S. Bhat wrote:
>>>> On 2/27/18 12:54 AM, Greg Kroah-Hartman wrote:
>>>>> On Mon, Feb 26, 2018 at 07:44:28PM -0800, Srivatsa S. Bhat wrote:
>>>>>> On 1/3/18 6:15 PM, Srivatsa S. Bhat wrote:
>>>>>>> On 11/1/17 8:18 AM, Greg Kroah-Hartman wrote:
>>>>>>>> On Tue, Oct 31, 2017 at 03:02:11PM +0200, Thomas Backlund wrote:
>>>>>>>>> Den 31.10.2017 kl. 11:55, skrev Greg Kroah-Hartman:
>>>>>>>>>> 4.13-stable review patch.  If anyone has any objections, please let 
>>>>>>>>>> me know.
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> From: Steve French 
>>>>>>>>>>
>>>>>>>>>> commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd upstream.
>>>>>>>>>>
>>>>>>>>>> According to MS-SMB2 3.2.55 validate_negotiate request must
>>>>>>>>>> always be signed. Some Windows can fail the request if you send it 
>>>>>>>>>> unsigned
>>>>>>>>>>
>>>>>>>>>> See kernel bugzilla bug 197311
>>>>>>>>>>
>>>>>>>>>> Acked-by: Ronnie Sahlberg 
>>>>>>>>>> Signed-off-by: Steve French 
>>>>>>>>>> Signed-off-by: Greg Kroah-Hartman 
>>>>>>>>>>
>>>>>>>>>> ---
>>>>>>>>>>   fs/cifs/smb2pdu.c |3 +++
>>>>>>>>>>   1 file changed, 3 insertions(+)
>>>>>>>>>>
>>>>>>>>>> --- a/fs/cifs/smb2pdu.c
>>>>>>>>>> +++ b/fs/cifs/smb2pdu.c
>>>>>>>>>> @@ -1963,6 +1963,9 @@ SMB2_ioctl(const unsigned int xid, struc
>>>>>>>>>>} else
>>>>>>>>>>iov[0].iov_len = get_rfc1002_length(req) + 4;
>>>>>>>>>> +  /* validate negotiate request must be signed - see MS-SMB2 
>>>>>>>>>> 3.2.5.5 */
>>>>>>>>>> +  if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO)
>>>>>>>>>> +  req->hdr.sync_hdr.Flags |= SMB2_FLAGS_SIGNED;
>>>>>>>>>>rc = SendReceive2(xid, ses, iov, n_iov, _buftype, flags, 
>>>>>>>>>> _iov);
>>>>>>>>>>cifs_small_buf_release(req);
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This one needs to be backported to all stable kernels as the commit 
>>>>>>>>> that
>>>>>>>>> introduced the regression:
>>>>>>>>> '
>>>>>>>>> 0603c96f3af50e2f9299fa410c224ab1d465e0f9
>>>>>>>>> SMB: Validate negotiate (to protect against downgrade) even if 
>>>>>>>>> signing off
>>>>>>>>>
>>>>>>>>> is backported in stable trees as of: 4.9.53, 4.4.90, 3.18.73
>>>>>>>>
>>>>>>>> Oh wait, it breaks the builds on older kernels, that's why I didn't
>>>>>>>> apply it :)
>>>>>>>>
>>>>>>>> Can you provide me with a workin

Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed

2018-02-27 Thread Srivatsa S. Bhat
On 2/27/18 4:40 AM, Greg Kroah-Hartman wrote:
> On Tue, Feb 27, 2018 at 01:22:31AM -0800, Srivatsa S. Bhat wrote:
>> On 2/27/18 12:54 AM, Greg Kroah-Hartman wrote:
>>> On Mon, Feb 26, 2018 at 07:44:28PM -0800, Srivatsa S. Bhat wrote:
>>>> On 1/3/18 6:15 PM, Srivatsa S. Bhat wrote:
>>>>> On 11/1/17 8:18 AM, Greg Kroah-Hartman wrote:
>>>>>> On Tue, Oct 31, 2017 at 03:02:11PM +0200, Thomas Backlund wrote:
>>>>>>> Den 31.10.2017 kl. 11:55, skrev Greg Kroah-Hartman:
>>>>>>>> 4.13-stable review patch.  If anyone has any objections, please let me 
>>>>>>>> know.
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> From: Steve French <smfre...@gmail.com>
>>>>>>>>
>>>>>>>> commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd upstream.
>>>>>>>>
>>>>>>>> According to MS-SMB2 3.2.55 validate_negotiate request must
>>>>>>>> always be signed. Some Windows can fail the request if you send it 
>>>>>>>> unsigned
>>>>>>>>
>>>>>>>> See kernel bugzilla bug 197311
>>>>>>>>
>>>>>>>> Acked-by: Ronnie Sahlberg 
>>>>>>>> Signed-off-by: Steve French <smfre...@gmail.com>
>>>>>>>> Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
>>>>>>>>
>>>>>>>> ---
>>>>>>>>   fs/cifs/smb2pdu.c |3 +++
>>>>>>>>   1 file changed, 3 insertions(+)
>>>>>>>>
>>>>>>>> --- a/fs/cifs/smb2pdu.c
>>>>>>>> +++ b/fs/cifs/smb2pdu.c
>>>>>>>> @@ -1963,6 +1963,9 @@ SMB2_ioctl(const unsigned int xid, struc
>>>>>>>>} else
>>>>>>>>iov[0].iov_len = get_rfc1002_length(req) + 4;
>>>>>>>> +  /* validate negotiate request must be signed - see MS-SMB2 
>>>>>>>> 3.2.5.5 */
>>>>>>>> +  if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO)
>>>>>>>> +  req->hdr.sync_hdr.Flags |= SMB2_FLAGS_SIGNED;
>>>>>>>>rc = SendReceive2(xid, ses, iov, n_iov, _buftype, flags, 
>>>>>>>> _iov);
>>>>>>>>cifs_small_buf_release(req);
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> This one needs to be backported to all stable kernels as the commit that
>>>>>>> introduced the regression:
>>>>>>> '
>>>>>>> 0603c96f3af50e2f9299fa410c224ab1d465e0f9
>>>>>>> SMB: Validate negotiate (to protect against downgrade) even if signing 
>>>>>>> off
>>>>>>>
>>>>>>> is backported in stable trees as of: 4.9.53, 4.4.90, 3.18.73
>>>>>>
>>>>>> Oh wait, it breaks the builds on older kernels, that's why I didn't
>>>>>> apply it :)
>>>>>>
>>>>>> Can you provide me with a working backport?
>>>>>>
>>>>>
>>>>> Hi Steve,
>>>>>
>>>>> Is there a version of this fix available for stable kernels?
>>>>>
>>>>
>>>> Hi Greg,
>>>>
>>>> Mounting SMB3 shares continues to fail for me on 4.4.118 and 4.9.84
>>>> due to the issues that I have described in detail on this mail thread.
>>>>
>>>> Since there is no apparent fix for this bug on stable kernels, could
>>>> you please consider reverting the original commit that caused this
>>>> regression?
>>>>
>>>> That commit was intended to enhance security, which is probably why it
>>>> was backported to stable kernels in the first place; but instead it
>>>> ends up breaking basic functionality itself (mounting). So in the
>>>> absence of a proper fix, I don't see much of an option but to revert
>>>> that commit.
>>>>
>>>> So, please consider reverting the following:
>>>>
>>>> commit 02ef29f9cbb616bf419 "SMB: Validate negotiate (to protect
>>>> against downgrade) even if signing off" on 4.4.118
>>>>
>>

Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed

2018-02-27 Thread Srivatsa S. Bhat
On 2/27/18 4:40 AM, Greg Kroah-Hartman wrote:
> On Tue, Feb 27, 2018 at 01:22:31AM -0800, Srivatsa S. Bhat wrote:
>> On 2/27/18 12:54 AM, Greg Kroah-Hartman wrote:
>>> On Mon, Feb 26, 2018 at 07:44:28PM -0800, Srivatsa S. Bhat wrote:
>>>> On 1/3/18 6:15 PM, Srivatsa S. Bhat wrote:
>>>>> On 11/1/17 8:18 AM, Greg Kroah-Hartman wrote:
>>>>>> On Tue, Oct 31, 2017 at 03:02:11PM +0200, Thomas Backlund wrote:
>>>>>>> Den 31.10.2017 kl. 11:55, skrev Greg Kroah-Hartman:
>>>>>>>> 4.13-stable review patch.  If anyone has any objections, please let me 
>>>>>>>> know.
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> From: Steve French 
>>>>>>>>
>>>>>>>> commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd upstream.
>>>>>>>>
>>>>>>>> According to MS-SMB2 3.2.55 validate_negotiate request must
>>>>>>>> always be signed. Some Windows can fail the request if you send it 
>>>>>>>> unsigned
>>>>>>>>
>>>>>>>> See kernel bugzilla bug 197311
>>>>>>>>
>>>>>>>> Acked-by: Ronnie Sahlberg 
>>>>>>>> Signed-off-by: Steve French 
>>>>>>>> Signed-off-by: Greg Kroah-Hartman 
>>>>>>>>
>>>>>>>> ---
>>>>>>>>   fs/cifs/smb2pdu.c |3 +++
>>>>>>>>   1 file changed, 3 insertions(+)
>>>>>>>>
>>>>>>>> --- a/fs/cifs/smb2pdu.c
>>>>>>>> +++ b/fs/cifs/smb2pdu.c
>>>>>>>> @@ -1963,6 +1963,9 @@ SMB2_ioctl(const unsigned int xid, struc
>>>>>>>>} else
>>>>>>>>iov[0].iov_len = get_rfc1002_length(req) + 4;
>>>>>>>> +  /* validate negotiate request must be signed - see MS-SMB2 
>>>>>>>> 3.2.5.5 */
>>>>>>>> +  if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO)
>>>>>>>> +  req->hdr.sync_hdr.Flags |= SMB2_FLAGS_SIGNED;
>>>>>>>>rc = SendReceive2(xid, ses, iov, n_iov, _buftype, flags, 
>>>>>>>> _iov);
>>>>>>>>cifs_small_buf_release(req);
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> This one needs to be backported to all stable kernels as the commit that
>>>>>>> introduced the regression:
>>>>>>> '
>>>>>>> 0603c96f3af50e2f9299fa410c224ab1d465e0f9
>>>>>>> SMB: Validate negotiate (to protect against downgrade) even if signing 
>>>>>>> off
>>>>>>>
>>>>>>> is backported in stable trees as of: 4.9.53, 4.4.90, 3.18.73
>>>>>>
>>>>>> Oh wait, it breaks the builds on older kernels, that's why I didn't
>>>>>> apply it :)
>>>>>>
>>>>>> Can you provide me with a working backport?
>>>>>>
>>>>>
>>>>> Hi Steve,
>>>>>
>>>>> Is there a version of this fix available for stable kernels?
>>>>>
>>>>
>>>> Hi Greg,
>>>>
>>>> Mounting SMB3 shares continues to fail for me on 4.4.118 and 4.9.84
>>>> due to the issues that I have described in detail on this mail thread.
>>>>
>>>> Since there is no apparent fix for this bug on stable kernels, could
>>>> you please consider reverting the original commit that caused this
>>>> regression?
>>>>
>>>> That commit was intended to enhance security, which is probably why it
>>>> was backported to stable kernels in the first place; but instead it
>>>> ends up breaking basic functionality itself (mounting). So in the
>>>> absence of a proper fix, I don't see much of an option but to revert
>>>> that commit.
>>>>
>>>> So, please consider reverting the following:
>>>>
>>>> commit 02ef29f9cbb616bf419 "SMB: Validate negotiate (to protect
>>>> against downgrade) even if signing off" on 4.4.118
>>>>
>>>> commit 0e1b85a41a25ac888fb "SMB: Validate negotiate (to protect
>>>

Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed

2018-02-27 Thread Srivatsa S. Bhat
On 2/27/18 12:54 AM, Greg Kroah-Hartman wrote:
> On Mon, Feb 26, 2018 at 07:44:28PM -0800, Srivatsa S. Bhat wrote:
>> On 1/3/18 6:15 PM, Srivatsa S. Bhat wrote:
>>> On 11/1/17 8:18 AM, Greg Kroah-Hartman wrote:
>>>> On Tue, Oct 31, 2017 at 03:02:11PM +0200, Thomas Backlund wrote:
>>>>> Den 31.10.2017 kl. 11:55, skrev Greg Kroah-Hartman:
>>>>>> 4.13-stable review patch.  If anyone has any objections, please let me 
>>>>>> know.
>>>>>>
>>>>>> --
>>>>>>
>>>>>> From: Steve French <smfre...@gmail.com>
>>>>>>
>>>>>> commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd upstream.
>>>>>>
>>>>>> According to MS-SMB2 3.2.55 validate_negotiate request must
>>>>>> always be signed. Some Windows can fail the request if you send it 
>>>>>> unsigned
>>>>>>
>>>>>> See kernel bugzilla bug 197311
>>>>>>
>>>>>> Acked-by: Ronnie Sahlberg 
>>>>>> Signed-off-by: Steve French <smfre...@gmail.com>
>>>>>> Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
>>>>>>
>>>>>> ---
>>>>>>   fs/cifs/smb2pdu.c |3 +++
>>>>>>   1 file changed, 3 insertions(+)
>>>>>>
>>>>>> --- a/fs/cifs/smb2pdu.c
>>>>>> +++ b/fs/cifs/smb2pdu.c
>>>>>> @@ -1963,6 +1963,9 @@ SMB2_ioctl(const unsigned int xid, struc
>>>>>>  } else
>>>>>>  iov[0].iov_len = get_rfc1002_length(req) + 4;
>>>>>> +/* validate negotiate request must be signed - see MS-SMB2 
>>>>>> 3.2.5.5 */
>>>>>> +if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO)
>>>>>> +req->hdr.sync_hdr.Flags |= SMB2_FLAGS_SIGNED;
>>>>>>  rc = SendReceive2(xid, ses, iov, n_iov, _buftype, flags, 
>>>>>> _iov);
>>>>>>  cifs_small_buf_release(req);
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> This one needs to be backported to all stable kernels as the commit that
>>>>> introduced the regression:
>>>>> '
>>>>> 0603c96f3af50e2f9299fa410c224ab1d465e0f9
>>>>> SMB: Validate negotiate (to protect against downgrade) even if signing off
>>>>>
>>>>> is backported in stable trees as of: 4.9.53, 4.4.90, 3.18.73
>>>>
>>>> Oh wait, it breaks the builds on older kernels, that's why I didn't
>>>> apply it :)
>>>>
>>>> Can you provide me with a working backport?
>>>>
>>>
>>> Hi Steve,
>>>
>>> Is there a version of this fix available for stable kernels?
>>>
>>
>> Hi Greg,
>>
>> Mounting SMB3 shares continues to fail for me on 4.4.118 and 4.9.84
>> due to the issues that I have described in detail on this mail thread.
>>
>> Since there is no apparent fix for this bug on stable kernels, could
>> you please consider reverting the original commit that caused this
>> regression?
>>
>> That commit was intended to enhance security, which is probably why it
>> was backported to stable kernels in the first place; but instead it
>> ends up breaking basic functionality itself (mounting). So in the
>> absence of a proper fix, I don't see much of an option but to revert
>> that commit.
>>
>> So, please consider reverting the following:
>>
>> commit 02ef29f9cbb616bf419 "SMB: Validate negotiate (to protect
>> against downgrade) even if signing off" on 4.4.118
>>
>> commit 0e1b85a41a25ac888fb "SMB: Validate negotiate (to protect
>> against downgrade) even if signing off" on 4.9.84
>>
>> They correspond to commit 0603c96f3af50e2f9299fa410c224ab1d465e0f9
>> upstream. Both these patches should revert cleanly. 
> 
> Do you still have this same problem on 4.14 and 4.15?  If so, the issue
> needs to get fixed there, not papered-over by reverting these old
> changes, as you will hit the issue again in the future when you update
> to a newer kernel version.
> 

4.14 and 4.15 work great! (I had mentioned this is in my original bug
report but forgot to summarize it here, sorry).

Thank you!

Regards,
Srivatsa


Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed

2018-02-27 Thread Srivatsa S. Bhat
On 2/27/18 12:54 AM, Greg Kroah-Hartman wrote:
> On Mon, Feb 26, 2018 at 07:44:28PM -0800, Srivatsa S. Bhat wrote:
>> On 1/3/18 6:15 PM, Srivatsa S. Bhat wrote:
>>> On 11/1/17 8:18 AM, Greg Kroah-Hartman wrote:
>>>> On Tue, Oct 31, 2017 at 03:02:11PM +0200, Thomas Backlund wrote:
>>>>> Den 31.10.2017 kl. 11:55, skrev Greg Kroah-Hartman:
>>>>>> 4.13-stable review patch.  If anyone has any objections, please let me 
>>>>>> know.
>>>>>>
>>>>>> --
>>>>>>
>>>>>> From: Steve French 
>>>>>>
>>>>>> commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd upstream.
>>>>>>
>>>>>> According to MS-SMB2 3.2.55 validate_negotiate request must
>>>>>> always be signed. Some Windows can fail the request if you send it 
>>>>>> unsigned
>>>>>>
>>>>>> See kernel bugzilla bug 197311
>>>>>>
>>>>>> Acked-by: Ronnie Sahlberg 
>>>>>> Signed-off-by: Steve French 
>>>>>> Signed-off-by: Greg Kroah-Hartman 
>>>>>>
>>>>>> ---
>>>>>>   fs/cifs/smb2pdu.c |3 +++
>>>>>>   1 file changed, 3 insertions(+)
>>>>>>
>>>>>> --- a/fs/cifs/smb2pdu.c
>>>>>> +++ b/fs/cifs/smb2pdu.c
>>>>>> @@ -1963,6 +1963,9 @@ SMB2_ioctl(const unsigned int xid, struc
>>>>>>  } else
>>>>>>  iov[0].iov_len = get_rfc1002_length(req) + 4;
>>>>>> +/* validate negotiate request must be signed - see MS-SMB2 
>>>>>> 3.2.5.5 */
>>>>>> +if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO)
>>>>>> +req->hdr.sync_hdr.Flags |= SMB2_FLAGS_SIGNED;
>>>>>>  rc = SendReceive2(xid, ses, iov, n_iov, _buftype, flags, 
>>>>>> _iov);
>>>>>>  cifs_small_buf_release(req);
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> This one needs to be backported to all stable kernels as the commit that
>>>>> introduced the regression:
>>>>> '
>>>>> 0603c96f3af50e2f9299fa410c224ab1d465e0f9
>>>>> SMB: Validate negotiate (to protect against downgrade) even if signing off
>>>>>
>>>>> is backported in stable trees as of: 4.9.53, 4.4.90, 3.18.73
>>>>
>>>> Oh wait, it breaks the builds on older kernels, that's why I didn't
>>>> apply it :)
>>>>
>>>> Can you provide me with a working backport?
>>>>
>>>
>>> Hi Steve,
>>>
>>> Is there a version of this fix available for stable kernels?
>>>
>>
>> Hi Greg,
>>
>> Mounting SMB3 shares continues to fail for me on 4.4.118 and 4.9.84
>> due to the issues that I have described in detail on this mail thread.
>>
>> Since there is no apparent fix for this bug on stable kernels, could
>> you please consider reverting the original commit that caused this
>> regression?
>>
>> That commit was intended to enhance security, which is probably why it
>> was backported to stable kernels in the first place; but instead it
>> ends up breaking basic functionality itself (mounting). So in the
>> absence of a proper fix, I don't see much of an option but to revert
>> that commit.
>>
>> So, please consider reverting the following:
>>
>> commit 02ef29f9cbb616bf419 "SMB: Validate negotiate (to protect
>> against downgrade) even if signing off" on 4.4.118
>>
>> commit 0e1b85a41a25ac888fb "SMB: Validate negotiate (to protect
>> against downgrade) even if signing off" on 4.9.84
>>
>> They correspond to commit 0603c96f3af50e2f9299fa410c224ab1d465e0f9
>> upstream. Both these patches should revert cleanly. 
> 
> Do you still have this same problem on 4.14 and 4.15?  If so, the issue
> needs to get fixed there, not papered-over by reverting these old
> changes, as you will hit the issue again in the future when you update
> to a newer kernel version.
> 

4.14 and 4.15 work great! (I had mentioned this is in my original bug
report but forgot to summarize it here, sorry).

Thank you!

Regards,
Srivatsa


Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed

2018-02-26 Thread Srivatsa S. Bhat
On 1/3/18 6:15 PM, Srivatsa S. Bhat wrote:
> On 11/1/17 8:18 AM, Greg Kroah-Hartman wrote:
>> On Tue, Oct 31, 2017 at 03:02:11PM +0200, Thomas Backlund wrote:
>>> Den 31.10.2017 kl. 11:55, skrev Greg Kroah-Hartman:
>>>> 4.13-stable review patch.  If anyone has any objections, please let me 
>>>> know.
>>>>
>>>> --
>>>>
>>>> From: Steve French <smfre...@gmail.com>
>>>>
>>>> commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd upstream.
>>>>
>>>> According to MS-SMB2 3.2.55 validate_negotiate request must
>>>> always be signed. Some Windows can fail the request if you send it unsigned
>>>>
>>>> See kernel bugzilla bug 197311
>>>>
>>>> Acked-by: Ronnie Sahlberg 
>>>> Signed-off-by: Steve French <smfre...@gmail.com>
>>>> Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
>>>>
>>>> ---
>>>>   fs/cifs/smb2pdu.c |3 +++
>>>>   1 file changed, 3 insertions(+)
>>>>
>>>> --- a/fs/cifs/smb2pdu.c
>>>> +++ b/fs/cifs/smb2pdu.c
>>>> @@ -1963,6 +1963,9 @@ SMB2_ioctl(const unsigned int xid, struc
>>>>} else
>>>>iov[0].iov_len = get_rfc1002_length(req) + 4;
>>>> +  /* validate negotiate request must be signed - see MS-SMB2 3.2.5.5 */
>>>> +  if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO)
>>>> +  req->hdr.sync_hdr.Flags |= SMB2_FLAGS_SIGNED;
>>>>rc = SendReceive2(xid, ses, iov, n_iov, _buftype, flags, _iov);
>>>>cifs_small_buf_release(req);
>>>>
>>>>
>>>>
>>>
>>> This one needs to be backported to all stable kernels as the commit that
>>> introduced the regression:
>>> '
>>> 0603c96f3af50e2f9299fa410c224ab1d465e0f9
>>> SMB: Validate negotiate (to protect against downgrade) even if signing off
>>>
>>> is backported in stable trees as of: 4.9.53, 4.4.90, 3.18.73
>>
>> Oh wait, it breaks the builds on older kernels, that's why I didn't
>> apply it :)
>>
>> Can you provide me with a working backport?
>>
> 
> Hi Steve,
> 
> Is there a version of this fix available for stable kernels?
> 

Hi Greg,

Mounting SMB3 shares continues to fail for me on 4.4.118 and 4.9.84
due to the issues that I have described in detail on this mail thread.

Since there is no apparent fix for this bug on stable kernels, could
you please consider reverting the original commit that caused this
regression?

That commit was intended to enhance security, which is probably why it
was backported to stable kernels in the first place; but instead it
ends up breaking basic functionality itself (mounting). So in the
absence of a proper fix, I don't see much of an option but to revert
that commit.

So, please consider reverting the following:

commit 02ef29f9cbb616bf419 "SMB: Validate negotiate (to protect
against downgrade) even if signing off" on 4.4.118

commit 0e1b85a41a25ac888fb "SMB: Validate negotiate (to protect
against downgrade) even if signing off" on 4.9.84

They correspond to commit 0603c96f3af50e2f9299fa410c224ab1d465e0f9
upstream. Both these patches should revert cleanly. 

Thank you!

Regards,
Srivatsa


Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed

2018-02-26 Thread Srivatsa S. Bhat
On 1/3/18 6:15 PM, Srivatsa S. Bhat wrote:
> On 11/1/17 8:18 AM, Greg Kroah-Hartman wrote:
>> On Tue, Oct 31, 2017 at 03:02:11PM +0200, Thomas Backlund wrote:
>>> Den 31.10.2017 kl. 11:55, skrev Greg Kroah-Hartman:
>>>> 4.13-stable review patch.  If anyone has any objections, please let me 
>>>> know.
>>>>
>>>> --
>>>>
>>>> From: Steve French 
>>>>
>>>> commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd upstream.
>>>>
>>>> According to MS-SMB2 3.2.55 validate_negotiate request must
>>>> always be signed. Some Windows can fail the request if you send it unsigned
>>>>
>>>> See kernel bugzilla bug 197311
>>>>
>>>> Acked-by: Ronnie Sahlberg 
>>>> Signed-off-by: Steve French 
>>>> Signed-off-by: Greg Kroah-Hartman 
>>>>
>>>> ---
>>>>   fs/cifs/smb2pdu.c |3 +++
>>>>   1 file changed, 3 insertions(+)
>>>>
>>>> --- a/fs/cifs/smb2pdu.c
>>>> +++ b/fs/cifs/smb2pdu.c
>>>> @@ -1963,6 +1963,9 @@ SMB2_ioctl(const unsigned int xid, struc
>>>>} else
>>>>iov[0].iov_len = get_rfc1002_length(req) + 4;
>>>> +  /* validate negotiate request must be signed - see MS-SMB2 3.2.5.5 */
>>>> +  if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO)
>>>> +  req->hdr.sync_hdr.Flags |= SMB2_FLAGS_SIGNED;
>>>>rc = SendReceive2(xid, ses, iov, n_iov, _buftype, flags, _iov);
>>>>cifs_small_buf_release(req);
>>>>
>>>>
>>>>
>>>
>>> This one needs to be backported to all stable kernels as the commit that
>>> introduced the regression:
>>> '
>>> 0603c96f3af50e2f9299fa410c224ab1d465e0f9
>>> SMB: Validate negotiate (to protect against downgrade) even if signing off
>>>
>>> is backported in stable trees as of: 4.9.53, 4.4.90, 3.18.73
>>
>> Oh wait, it breaks the builds on older kernels, that's why I didn't
>> apply it :)
>>
>> Can you provide me with a working backport?
>>
> 
> Hi Steve,
> 
> Is there a version of this fix available for stable kernels?
> 

Hi Greg,

Mounting SMB3 shares continues to fail for me on 4.4.118 and 4.9.84
due to the issues that I have described in detail on this mail thread.

Since there is no apparent fix for this bug on stable kernels, could
you please consider reverting the original commit that caused this
regression?

That commit was intended to enhance security, which is probably why it
was backported to stable kernels in the first place; but instead it
ends up breaking basic functionality itself (mounting). So in the
absence of a proper fix, I don't see much of an option but to revert
that commit.

So, please consider reverting the following:

commit 02ef29f9cbb616bf419 "SMB: Validate negotiate (to protect
against downgrade) even if signing off" on 4.4.118

commit 0e1b85a41a25ac888fb "SMB: Validate negotiate (to protect
against downgrade) even if signing off" on 4.9.84

They correspond to commit 0603c96f3af50e2f9299fa410c224ab1d465e0f9
upstream. Both these patches should revert cleanly. 

Thank you!

Regards,
Srivatsa


Re: [PATCH 2/2] block, char_dev: Use correct format specifier for unsigned ints

2018-02-06 Thread Srivatsa S. Bhat
On 2/6/18 2:24 AM, Greg KH wrote:
> On Mon, Feb 05, 2018 at 06:25:27PM -0800, Srivatsa S. Bhat wrote:
>> From: Srivatsa S. Bhat <sriva...@csail.mit.edu>
>>
>> register_blkdev() and __register_chrdev_region() treat the major
>> number as an unsigned int. So print it the same way to avoid
>> absurd error statements such as:
>> "... major requested (-1) is greater than the maximum (511) ..."
>> (and also fix off-by-one bugs in the error prints).
>>
>> While at it, also update the comment describing register_blkdev().
>>
>> Signed-off-by: Srivatsa S. Bhat <sriva...@csail.mit.edu>
>> ---
>>
>>  block/genhd.c |   19 +++
>>  fs/char_dev.c |4 ++--
>>  2 files changed, 13 insertions(+), 10 deletions(-)
>>
>> diff --git a/block/genhd.c b/block/genhd.c
>> index 88a53c1..21a18e5 100644
>> --- a/block/genhd.c
>> +++ b/block/genhd.c
>> @@ -308,19 +308,22 @@ void blkdev_show(struct seq_file *seqf, off_t offset)
>>  /**
>>   * register_blkdev - register a new block device
>>   *
>> - * @major: the requested major device number [1..255]. If @major = 0, try to
>> - * allocate any unused major number.
>> + * @major: the requested major device number [1..BLKDEV_MAJOR_MAX-1]. If
>> + * @major = 0, try to allocate any unused major number.
>>   * @name: the name of the new block device as a zero terminated string
>>   *
>>   * The @name must be unique within the system.
>>   *
>>   * The return value depends on the @major input parameter:
>>   *
>> - *  - if a major device number was requested in range [1..255] then the
>> - *function returns zero on success, or a negative error code
>> + *  - if a major device number was requested in range 
>> [1..BLKDEV_MAJOR_MAX-1]
>> + *then the function returns zero on success, or a negative error code
>>   *  - if any unused major number was requested with @major = 0 parameter
>>   *then the return value is the allocated major number in range
>> - *[1..255] or a negative error code otherwise
>> + *[1..BLKDEV_MAJOR_MAX-1] or a negative error code otherwise
>> + *
>> + * See Documentation/admin-guide/devices.txt for the list of allocated
>> + * major numbers.
>>   */
>>  int register_blkdev(unsigned int major, const char *name)
>>  {
>> @@ -347,8 +350,8 @@ int register_blkdev(unsigned int major, const char *name)
>>  }
>>  
>>  if (major >= BLKDEV_MAJOR_MAX) {
>> -pr_err("register_blkdev: major requested (%d) is greater than 
>> the maximum (%d) for %s\n",
>> -   major, BLKDEV_MAJOR_MAX, name);
>> +pr_err("register_blkdev: major requested (%u) is greater than 
>> the maximum (%u) for %s\n",
>> +   major, BLKDEV_MAJOR_MAX-1, name);
>>  
>>  ret = -EINVAL;
>>  goto out;
>> @@ -375,7 +378,7 @@ int register_blkdev(unsigned int major, const char *name)
>>  ret = -EBUSY;
>>  
>>  if (ret < 0) {
>> -printk("register_blkdev: cannot get major %d for %s\n",
>> +printk("register_blkdev: cannot get major %u for %s\n",
>> major, name);
>>  kfree(p);
>>  }
>> diff --git a/fs/char_dev.c b/fs/char_dev.c
>> index 33c9385..a279c58 100644
>> --- a/fs/char_dev.c
>> +++ b/fs/char_dev.c
>> @@ -121,8 +121,8 @@ __register_chrdev_region(unsigned int major, unsigned 
>> int baseminor,
>>  }
>>  
>>  if (major >= CHRDEV_MAJOR_MAX) {
>> -pr_err("CHRDEV \"%s\" major requested (%d) is greater than the 
>> maximum (%d)\n",
>> -   name, major, CHRDEV_MAJOR_MAX);
>> +pr_err("CHRDEV \"%s\" major requested (%u) is greater than the 
>> maximum (%u)\n",
>> +   name, major, CHRDEV_MAJOR_MAX-1);
>>  ret = -EINVAL;
>>  goto out;
>>  }
> 
> Thanks for both of these patches, if Al doesn't grab them, I will after
> 4.16-rc1 comes out.
> 

Sounds great! Thank you!

Regards,
Srivatsa


Re: [PATCH 2/2] block, char_dev: Use correct format specifier for unsigned ints

2018-02-06 Thread Srivatsa S. Bhat
On 2/6/18 2:24 AM, Greg KH wrote:
> On Mon, Feb 05, 2018 at 06:25:27PM -0800, Srivatsa S. Bhat wrote:
>> From: Srivatsa S. Bhat 
>>
>> register_blkdev() and __register_chrdev_region() treat the major
>> number as an unsigned int. So print it the same way to avoid
>> absurd error statements such as:
>> "... major requested (-1) is greater than the maximum (511) ..."
>> (and also fix off-by-one bugs in the error prints).
>>
>> While at it, also update the comment describing register_blkdev().
>>
>> Signed-off-by: Srivatsa S. Bhat 
>> ---
>>
>>  block/genhd.c |   19 +++
>>  fs/char_dev.c |4 ++--
>>  2 files changed, 13 insertions(+), 10 deletions(-)
>>
>> diff --git a/block/genhd.c b/block/genhd.c
>> index 88a53c1..21a18e5 100644
>> --- a/block/genhd.c
>> +++ b/block/genhd.c
>> @@ -308,19 +308,22 @@ void blkdev_show(struct seq_file *seqf, off_t offset)
>>  /**
>>   * register_blkdev - register a new block device
>>   *
>> - * @major: the requested major device number [1..255]. If @major = 0, try to
>> - * allocate any unused major number.
>> + * @major: the requested major device number [1..BLKDEV_MAJOR_MAX-1]. If
>> + * @major = 0, try to allocate any unused major number.
>>   * @name: the name of the new block device as a zero terminated string
>>   *
>>   * The @name must be unique within the system.
>>   *
>>   * The return value depends on the @major input parameter:
>>   *
>> - *  - if a major device number was requested in range [1..255] then the
>> - *function returns zero on success, or a negative error code
>> + *  - if a major device number was requested in range 
>> [1..BLKDEV_MAJOR_MAX-1]
>> + *then the function returns zero on success, or a negative error code
>>   *  - if any unused major number was requested with @major = 0 parameter
>>   *then the return value is the allocated major number in range
>> - *[1..255] or a negative error code otherwise
>> + *[1..BLKDEV_MAJOR_MAX-1] or a negative error code otherwise
>> + *
>> + * See Documentation/admin-guide/devices.txt for the list of allocated
>> + * major numbers.
>>   */
>>  int register_blkdev(unsigned int major, const char *name)
>>  {
>> @@ -347,8 +350,8 @@ int register_blkdev(unsigned int major, const char *name)
>>  }
>>  
>>  if (major >= BLKDEV_MAJOR_MAX) {
>> -pr_err("register_blkdev: major requested (%d) is greater than 
>> the maximum (%d) for %s\n",
>> -   major, BLKDEV_MAJOR_MAX, name);
>> +pr_err("register_blkdev: major requested (%u) is greater than 
>> the maximum (%u) for %s\n",
>> +   major, BLKDEV_MAJOR_MAX-1, name);
>>  
>>  ret = -EINVAL;
>>  goto out;
>> @@ -375,7 +378,7 @@ int register_blkdev(unsigned int major, const char *name)
>>  ret = -EBUSY;
>>  
>>  if (ret < 0) {
>> -printk("register_blkdev: cannot get major %d for %s\n",
>> +printk("register_blkdev: cannot get major %u for %s\n",
>> major, name);
>>  kfree(p);
>>  }
>> diff --git a/fs/char_dev.c b/fs/char_dev.c
>> index 33c9385..a279c58 100644
>> --- a/fs/char_dev.c
>> +++ b/fs/char_dev.c
>> @@ -121,8 +121,8 @@ __register_chrdev_region(unsigned int major, unsigned 
>> int baseminor,
>>  }
>>  
>>  if (major >= CHRDEV_MAJOR_MAX) {
>> -pr_err("CHRDEV \"%s\" major requested (%d) is greater than the 
>> maximum (%d)\n",
>> -   name, major, CHRDEV_MAJOR_MAX);
>> +pr_err("CHRDEV \"%s\" major requested (%u) is greater than the 
>> maximum (%u)\n",
>> +   name, major, CHRDEV_MAJOR_MAX-1);
>>  ret = -EINVAL;
>>  goto out;
>>  }
> 
> Thanks for both of these patches, if Al doesn't grab them, I will after
> 4.16-rc1 comes out.
> 

Sounds great! Thank you!

Regards,
Srivatsa


[PATCH 2/2] block, char_dev: Use correct format specifier for unsigned ints

2018-02-05 Thread Srivatsa S. Bhat
From: Srivatsa S. Bhat <sriva...@csail.mit.edu>

register_blkdev() and __register_chrdev_region() treat the major
number as an unsigned int. So print it the same way to avoid
absurd error statements such as:
"... major requested (-1) is greater than the maximum (511) ..."
(and also fix off-by-one bugs in the error prints).

While at it, also update the comment describing register_blkdev().

Signed-off-by: Srivatsa S. Bhat <sriva...@csail.mit.edu>
---

 block/genhd.c |   19 +++
 fs/char_dev.c |4 ++--
 2 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 88a53c1..21a18e5 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -308,19 +308,22 @@ void blkdev_show(struct seq_file *seqf, off_t offset)
 /**
  * register_blkdev - register a new block device
  *
- * @major: the requested major device number [1..255]. If @major = 0, try to
- * allocate any unused major number.
+ * @major: the requested major device number [1..BLKDEV_MAJOR_MAX-1]. If
+ * @major = 0, try to allocate any unused major number.
  * @name: the name of the new block device as a zero terminated string
  *
  * The @name must be unique within the system.
  *
  * The return value depends on the @major input parameter:
  *
- *  - if a major device number was requested in range [1..255] then the
- *function returns zero on success, or a negative error code
+ *  - if a major device number was requested in range [1..BLKDEV_MAJOR_MAX-1]
+ *then the function returns zero on success, or a negative error code
  *  - if any unused major number was requested with @major = 0 parameter
  *then the return value is the allocated major number in range
- *[1..255] or a negative error code otherwise
+ *[1..BLKDEV_MAJOR_MAX-1] or a negative error code otherwise
+ *
+ * See Documentation/admin-guide/devices.txt for the list of allocated
+ * major numbers.
  */
 int register_blkdev(unsigned int major, const char *name)
 {
@@ -347,8 +350,8 @@ int register_blkdev(unsigned int major, const char *name)
}
 
if (major >= BLKDEV_MAJOR_MAX) {
-   pr_err("register_blkdev: major requested (%d) is greater than 
the maximum (%d) for %s\n",
-  major, BLKDEV_MAJOR_MAX, name);
+   pr_err("register_blkdev: major requested (%u) is greater than 
the maximum (%u) for %s\n",
+  major, BLKDEV_MAJOR_MAX-1, name);
 
ret = -EINVAL;
goto out;
@@ -375,7 +378,7 @@ int register_blkdev(unsigned int major, const char *name)
ret = -EBUSY;
 
if (ret < 0) {
-   printk("register_blkdev: cannot get major %d for %s\n",
+   printk("register_blkdev: cannot get major %u for %s\n",
   major, name);
kfree(p);
}
diff --git a/fs/char_dev.c b/fs/char_dev.c
index 33c9385..a279c58 100644
--- a/fs/char_dev.c
+++ b/fs/char_dev.c
@@ -121,8 +121,8 @@ __register_chrdev_region(unsigned int major, unsigned int 
baseminor,
}
 
if (major >= CHRDEV_MAJOR_MAX) {
-   pr_err("CHRDEV \"%s\" major requested (%d) is greater than the 
maximum (%d)\n",
-  name, major, CHRDEV_MAJOR_MAX);
+   pr_err("CHRDEV \"%s\" major requested (%u) is greater than the 
maximum (%u)\n",
+  name, major, CHRDEV_MAJOR_MAX-1);
ret = -EINVAL;
goto out;
}



[PATCH 2/2] block, char_dev: Use correct format specifier for unsigned ints

2018-02-05 Thread Srivatsa S. Bhat
From: Srivatsa S. Bhat 

register_blkdev() and __register_chrdev_region() treat the major
number as an unsigned int. So print it the same way to avoid
absurd error statements such as:
"... major requested (-1) is greater than the maximum (511) ..."
(and also fix off-by-one bugs in the error prints).

While at it, also update the comment describing register_blkdev().

Signed-off-by: Srivatsa S. Bhat 
---

 block/genhd.c |   19 +++
 fs/char_dev.c |4 ++--
 2 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 88a53c1..21a18e5 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -308,19 +308,22 @@ void blkdev_show(struct seq_file *seqf, off_t offset)
 /**
  * register_blkdev - register a new block device
  *
- * @major: the requested major device number [1..255]. If @major = 0, try to
- * allocate any unused major number.
+ * @major: the requested major device number [1..BLKDEV_MAJOR_MAX-1]. If
+ * @major = 0, try to allocate any unused major number.
  * @name: the name of the new block device as a zero terminated string
  *
  * The @name must be unique within the system.
  *
  * The return value depends on the @major input parameter:
  *
- *  - if a major device number was requested in range [1..255] then the
- *function returns zero on success, or a negative error code
+ *  - if a major device number was requested in range [1..BLKDEV_MAJOR_MAX-1]
+ *then the function returns zero on success, or a negative error code
  *  - if any unused major number was requested with @major = 0 parameter
  *then the return value is the allocated major number in range
- *[1..255] or a negative error code otherwise
+ *[1..BLKDEV_MAJOR_MAX-1] or a negative error code otherwise
+ *
+ * See Documentation/admin-guide/devices.txt for the list of allocated
+ * major numbers.
  */
 int register_blkdev(unsigned int major, const char *name)
 {
@@ -347,8 +350,8 @@ int register_blkdev(unsigned int major, const char *name)
}
 
if (major >= BLKDEV_MAJOR_MAX) {
-   pr_err("register_blkdev: major requested (%d) is greater than 
the maximum (%d) for %s\n",
-  major, BLKDEV_MAJOR_MAX, name);
+   pr_err("register_blkdev: major requested (%u) is greater than 
the maximum (%u) for %s\n",
+  major, BLKDEV_MAJOR_MAX-1, name);
 
ret = -EINVAL;
goto out;
@@ -375,7 +378,7 @@ int register_blkdev(unsigned int major, const char *name)
ret = -EBUSY;
 
if (ret < 0) {
-   printk("register_blkdev: cannot get major %d for %s\n",
+   printk("register_blkdev: cannot get major %u for %s\n",
   major, name);
kfree(p);
}
diff --git a/fs/char_dev.c b/fs/char_dev.c
index 33c9385..a279c58 100644
--- a/fs/char_dev.c
+++ b/fs/char_dev.c
@@ -121,8 +121,8 @@ __register_chrdev_region(unsigned int major, unsigned int 
baseminor,
}
 
if (major >= CHRDEV_MAJOR_MAX) {
-   pr_err("CHRDEV \"%s\" major requested (%d) is greater than the 
maximum (%d)\n",
-  name, major, CHRDEV_MAJOR_MAX);
+   pr_err("CHRDEV \"%s\" major requested (%u) is greater than the 
maximum (%u)\n",
+  name, major, CHRDEV_MAJOR_MAX-1);
ret = -EINVAL;
goto out;
}



[PATCH 1/2] char_dev: Fix off-by-one bugs in find_dynamic_major()

2018-02-05 Thread Srivatsa S. Bhat
From: Srivatsa S. Bhat <sriva...@csail.mit.edu>

CHRDEV_MAJOR_DYN_END and CHRDEV_MAJOR_DYN_EXT_END are valid major
numbers. So fix the loop iteration to include them in the search for
free major numbers.

While at it, also remove a redundant if condition ("cd->major != i"),
as it will never be true.

Signed-off-by: Srivatsa S. Bhat <sriva...@csail.mit.edu>
---

 fs/char_dev.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/char_dev.c b/fs/char_dev.c
index a65e4a5..33c9385 100644
--- a/fs/char_dev.c
+++ b/fs/char_dev.c
@@ -67,18 +67,18 @@ static int find_dynamic_major(void)
int i;
struct char_device_struct *cd;
 
-   for (i = ARRAY_SIZE(chrdevs)-1; i > CHRDEV_MAJOR_DYN_END; i--) {
+   for (i = ARRAY_SIZE(chrdevs)-1; i >= CHRDEV_MAJOR_DYN_END; i--) {
if (chrdevs[i] == NULL)
return i;
}
 
for (i = CHRDEV_MAJOR_DYN_EXT_START;
-i > CHRDEV_MAJOR_DYN_EXT_END; i--) {
+i >= CHRDEV_MAJOR_DYN_EXT_END; i--) {
for (cd = chrdevs[major_to_index(i)]; cd; cd = cd->next)
if (cd->major == i)
break;
 
-   if (cd == NULL || cd->major != i)
+   if (cd == NULL)
return i;
}
 



[PATCH 1/2] char_dev: Fix off-by-one bugs in find_dynamic_major()

2018-02-05 Thread Srivatsa S. Bhat
From: Srivatsa S. Bhat 

CHRDEV_MAJOR_DYN_END and CHRDEV_MAJOR_DYN_EXT_END are valid major
numbers. So fix the loop iteration to include them in the search for
free major numbers.

While at it, also remove a redundant if condition ("cd->major != i"),
as it will never be true.

Signed-off-by: Srivatsa S. Bhat 
---

 fs/char_dev.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/char_dev.c b/fs/char_dev.c
index a65e4a5..33c9385 100644
--- a/fs/char_dev.c
+++ b/fs/char_dev.c
@@ -67,18 +67,18 @@ static int find_dynamic_major(void)
int i;
struct char_device_struct *cd;
 
-   for (i = ARRAY_SIZE(chrdevs)-1; i > CHRDEV_MAJOR_DYN_END; i--) {
+   for (i = ARRAY_SIZE(chrdevs)-1; i >= CHRDEV_MAJOR_DYN_END; i--) {
if (chrdevs[i] == NULL)
return i;
}
 
for (i = CHRDEV_MAJOR_DYN_EXT_START;
-i > CHRDEV_MAJOR_DYN_EXT_END; i--) {
+i >= CHRDEV_MAJOR_DYN_EXT_END; i--) {
for (cd = chrdevs[major_to_index(i)]; cd; cd = cd->next)
if (cd->major == i)
break;
 
-   if (cd == NULL || cd->major != i)
+   if (cd == NULL)
return i;
}
 



Re: Change in register_blkdev() behavior

2018-02-01 Thread Srivatsa S. Bhat
On 2/1/18 5:10 PM, Logan Gunthorpe wrote:
> 
> 
> On 01/02/18 05:23 PM, Srivatsa S. Bhat wrote:
>> static int find_dynamic_major(void)
>> {
>>  int i;
>>  struct char_device_struct *cd;
>>
>>  for (i = ARRAY_SIZE(chrdevs)-1; i > CHRDEV_MAJOR_DYN_END; i--) {
>>   
>> As far as I can see, _DYN_END is inclusive, so shouldn't this be >= ?
> 
> Yes, it looks like _DYN_END should have been inclusive based on the way I 
> documented it.
> 

Thank you for confirming! I'll send a patch to fix that (and the analogous
case for CHRDEV_MAJOR_DYN_EXT_END).

>>
>>  for (cd = chrdevs[major_to_index(i)]; cd; cd = cd->next)
>>  if (cd->major == i)
>>  break;
>>
>>  if (cd == NULL || cd->major != i)
>>   
>> It seems that this latter condition is unnecessary, as it will never be
>> true. (We'll reach here only if cd == NULL or cd->major == i).
> 
> Not quite. chrdevs[] may contain majors that also hit on the hash but don't 
> equal 'i'. So the for loop will iterate through all hashes matching 'i' and 
> if there is one or more and they all don't match 'i', it will fall through 
> the loop and cd will be set to something non-null and not equal to i.
> 

Hmm, the code doesn't appear to be doing that though? The loop's fall
through occurs one past the last entry, when cd == NULL. The only
other way it can exit the loop is if it hits the break statement
(which implies that cd->major == i). So what am I missing?

Regards,
Srivatsa


Re: Change in register_blkdev() behavior

2018-02-01 Thread Srivatsa S. Bhat
On 2/1/18 5:10 PM, Logan Gunthorpe wrote:
> 
> 
> On 01/02/18 05:23 PM, Srivatsa S. Bhat wrote:
>> static int find_dynamic_major(void)
>> {
>>  int i;
>>  struct char_device_struct *cd;
>>
>>  for (i = ARRAY_SIZE(chrdevs)-1; i > CHRDEV_MAJOR_DYN_END; i--) {
>>   
>> As far as I can see, _DYN_END is inclusive, so shouldn't this be >= ?
> 
> Yes, it looks like _DYN_END should have been inclusive based on the way I 
> documented it.
> 

Thank you for confirming! I'll send a patch to fix that (and the analogous
case for CHRDEV_MAJOR_DYN_EXT_END).

>>
>>  for (cd = chrdevs[major_to_index(i)]; cd; cd = cd->next)
>>  if (cd->major == i)
>>  break;
>>
>>  if (cd == NULL || cd->major != i)
>>   
>> It seems that this latter condition is unnecessary, as it will never be
>> true. (We'll reach here only if cd == NULL or cd->major == i).
> 
> Not quite. chrdevs[] may contain majors that also hit on the hash but don't 
> equal 'i'. So the for loop will iterate through all hashes matching 'i' and 
> if there is one or more and they all don't match 'i', it will fall through 
> the loop and cd will be set to something non-null and not equal to i.
> 

Hmm, the code doesn't appear to be doing that though? The loop's fall
through occurs one past the last entry, when cd == NULL. The only
other way it can exit the loop is if it hits the break statement
(which implies that cd->major == i). So what am I missing?

Regards,
Srivatsa


Re: Change in register_blkdev() behavior

2018-02-01 Thread Srivatsa S. Bhat
On 1/31/18 6:24 AM, Greg KH wrote:
> On Tue, Jan 30, 2018 at 04:56:32PM -0800, Srivatsa S. Bhat wrote:
>>
>> Hi,
>>
>> Before commit 133d55cdb2f "block: order /proc/devices by major number",
>> if register_blkdev() was called with major = [1..UINT_MAX], it used to
>> succeed (provided the requested major number was actually free).
> 
> How was LTP calling register_blkdev() with such crazy numbers?
> 

Haha :-) No idea!

> Anyway, I agree with Logan, this sounds like something to be resolved in
> LTP, as allowing block devices with numbers greater than the number we
> really allow seems like an odd requirement :)
> 

Agreed! And thanks for confirming!

Regards,
Srivatsa


Re: Change in register_blkdev() behavior

2018-02-01 Thread Srivatsa S. Bhat
On 1/31/18 6:24 AM, Greg KH wrote:
> On Tue, Jan 30, 2018 at 04:56:32PM -0800, Srivatsa S. Bhat wrote:
>>
>> Hi,
>>
>> Before commit 133d55cdb2f "block: order /proc/devices by major number",
>> if register_blkdev() was called with major = [1..UINT_MAX], it used to
>> succeed (provided the requested major number was actually free).
> 
> How was LTP calling register_blkdev() with such crazy numbers?
> 

Haha :-) No idea!

> Anyway, I agree with Logan, this sounds like something to be resolved in
> LTP, as allowing block devices with numbers greater than the number we
> really allow seems like an odd requirement :)
> 

Agreed! And thanks for confirming!

Regards,
Srivatsa


Re: Change in register_blkdev() behavior

2018-02-01 Thread Srivatsa S. Bhat
Hi Logan,

On 1/30/18 5:26 PM, Logan Gunthorpe wrote:
> 
> 
> On 30/01/18 05:56 PM, Srivatsa S. Bhat wrote:
>> If the restriction on the major number was intentional, perhaps we
>> should get the LTP testcase modified for kernel versions >= 4.14.
>> Otherwise, we should fix register_blkdev to preserve the old behavior.
>> (I guess the same thing applies to commit 8a932f73e5b "char_dev: order
>> /proc/devices by major number" as well).
> 
> The restriction was put in place so the code that prints the devices doesn't 
> have to run through every integer in order to print the devices in order.
> 
> Given the existing documented fixed numbers in [1] and that future new char 
> devices should be using dynamic allocation, this seemed like a reasonable 
> restriction.
> 
> It would be pretty trivial to increase the limit but, IMO, setting it to 
> UINT_MAX seems a bit much. Especially given that a lot of the documentation 
> and code still very much has remnants of the 256 limit. (The series that 
> included this patch only just expanded the char dynamic range to  above 256). 
> So, I'd suggest the LTP test should change.
> 

Sounds good! Thank you!

By the way, I happened to notice a few minor issues with the
find_dynamic_major() function added by this patch series:

static int find_dynamic_major(void)
{
int i;
struct char_device_struct *cd;

for (i = ARRAY_SIZE(chrdevs)-1; i > CHRDEV_MAJOR_DYN_END; i--) {
 
As far as I can see, _DYN_END is inclusive, so shouldn't this be >= ?

if (chrdevs[i] == NULL)
return i;
}

for (i = CHRDEV_MAJOR_DYN_EXT_START;
 i > CHRDEV_MAJOR_DYN_EXT_END; i--) {
  
Same here; I believe this should be >=

for (cd = chrdevs[major_to_index(i)]; cd; cd = cd->next)
if (cd->major == i)
break;

if (cd == NULL || cd->major != i)
 
It seems that this latter condition is unnecessary, as it will never be
true. (We'll reach here only if cd == NULL or cd->major == i).

return i;
}

return -EBUSY;
}

Regards,
Srivatsa


Re: Change in register_blkdev() behavior

2018-02-01 Thread Srivatsa S. Bhat
Hi Logan,

On 1/30/18 5:26 PM, Logan Gunthorpe wrote:
> 
> 
> On 30/01/18 05:56 PM, Srivatsa S. Bhat wrote:
>> If the restriction on the major number was intentional, perhaps we
>> should get the LTP testcase modified for kernel versions >= 4.14.
>> Otherwise, we should fix register_blkdev to preserve the old behavior.
>> (I guess the same thing applies to commit 8a932f73e5b "char_dev: order
>> /proc/devices by major number" as well).
> 
> The restriction was put in place so the code that prints the devices doesn't 
> have to run through every integer in order to print the devices in order.
> 
> Given the existing documented fixed numbers in [1] and that future new char 
> devices should be using dynamic allocation, this seemed like a reasonable 
> restriction.
> 
> It would be pretty trivial to increase the limit but, IMO, setting it to 
> UINT_MAX seems a bit much. Especially given that a lot of the documentation 
> and code still very much has remnants of the 256 limit. (The series that 
> included this patch only just expanded the char dynamic range to  above 256). 
> So, I'd suggest the LTP test should change.
> 

Sounds good! Thank you!

By the way, I happened to notice a few minor issues with the
find_dynamic_major() function added by this patch series:

static int find_dynamic_major(void)
{
int i;
struct char_device_struct *cd;

for (i = ARRAY_SIZE(chrdevs)-1; i > CHRDEV_MAJOR_DYN_END; i--) {
 
As far as I can see, _DYN_END is inclusive, so shouldn't this be >= ?

if (chrdevs[i] == NULL)
return i;
}

for (i = CHRDEV_MAJOR_DYN_EXT_START;
 i > CHRDEV_MAJOR_DYN_EXT_END; i--) {
  
Same here; I believe this should be >=

for (cd = chrdevs[major_to_index(i)]; cd; cd = cd->next)
if (cd->major == i)
break;

if (cd == NULL || cd->major != i)
 
It seems that this latter condition is unnecessary, as it will never be
true. (We'll reach here only if cd == NULL or cd->major == i).

return i;
}

return -EBUSY;
}

Regards,
Srivatsa


Change in register_blkdev() behavior

2018-01-30 Thread Srivatsa S. Bhat

Hi,

Before commit 133d55cdb2f "block: order /proc/devices by major number",
if register_blkdev() was called with major = [1..UINT_MAX], it used to
succeed (provided the requested major number was actually free).

However, while fixing the ordering in /proc/devices, commit 133d55cdb2f
also added this change:

@@ -309,6 +309,14 @@ int register_blkdev(unsigned int major, const char *name)
ret = major;
}
 
+   if (major >= BLKDEV_MAJOR_MAX) {
+   pr_err("register_blkdev: major requested (%d) is greater than 
the maximum (%d) for %s\n",
+  major, BLKDEV_MAJOR_MAX, name);
+
+   ret = -EINVAL;
+   goto out;
+   }
+
p = kmalloc(sizeof(struct blk_major_name), GFP_KERNEL);
if (p == NULL) {
ret = -ENOMEM;

So, after this commit, calls to register_blkdev() fail if the requested
major number is >= 512 (BLKDEV_MAJOR_MAX). I'm wondering if this was an
intentional change or not, as it wasn't explicitly called out in the
changelog (and the comment on top of register_blkdev() describing its
inputs seems quite out-of-date). This also breaks LTP testcase
block_dev/tc05, which tests for edge-cases and expects register_blkdev()
to succeed with major=UINT_MAX.

If the restriction on the major number was intentional, perhaps we
should get the LTP testcase modified for kernel versions >= 4.14.
Otherwise, we should fix register_blkdev to preserve the old behavior.
(I guess the same thing applies to commit 8a932f73e5b "char_dev: order
/proc/devices by major number" as well).

Thoughts?

Regards,
Srivatsa


Change in register_blkdev() behavior

2018-01-30 Thread Srivatsa S. Bhat

Hi,

Before commit 133d55cdb2f "block: order /proc/devices by major number",
if register_blkdev() was called with major = [1..UINT_MAX], it used to
succeed (provided the requested major number was actually free).

However, while fixing the ordering in /proc/devices, commit 133d55cdb2f
also added this change:

@@ -309,6 +309,14 @@ int register_blkdev(unsigned int major, const char *name)
ret = major;
}
 
+   if (major >= BLKDEV_MAJOR_MAX) {
+   pr_err("register_blkdev: major requested (%d) is greater than 
the maximum (%d) for %s\n",
+  major, BLKDEV_MAJOR_MAX, name);
+
+   ret = -EINVAL;
+   goto out;
+   }
+
p = kmalloc(sizeof(struct blk_major_name), GFP_KERNEL);
if (p == NULL) {
ret = -ENOMEM;

So, after this commit, calls to register_blkdev() fail if the requested
major number is >= 512 (BLKDEV_MAJOR_MAX). I'm wondering if this was an
intentional change or not, as it wasn't explicitly called out in the
changelog (and the comment on top of register_blkdev() describing its
inputs seems quite out-of-date). This also breaks LTP testcase
block_dev/tc05, which tests for edge-cases and expects register_blkdev()
to succeed with major=UINT_MAX.

If the restriction on the major number was intentional, perhaps we
should get the LTP testcase modified for kernel versions >= 4.14.
Otherwise, we should fix register_blkdev to preserve the old behavior.
(I guess the same thing applies to commit 8a932f73e5b "char_dev: order
/proc/devices by major number" as well).

Thoughts?

Regards,
Srivatsa


Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed

2018-01-29 Thread Srivatsa S. Bhat
Hi Aurélien,

On 1/19/18 5:23 AM, Aurélien Aptel wrote:
> Hi,
> 
> "Srivatsa S. Bhat" <sriva...@csail.mit.edu> writes:
>>> Any thoughts on what is the right fix for stable kernels? Mounting SMB3
>>> shares works great on mainline (v4.15-rc5). It also works on 4.4.109 if
>>> I pass the sec=ntlmsspi option to the mount command (as opposed to the
>>> default: sec=ntlmssp). Please let me know if you need any other info.
> 
> Make sure you have (in that order):
> 
> db3b5474f462 ("CIFS: Fix NULL pointer deref on SMB2_tcon() failure")
> fe83bebc0522 ("SMB: fix leak of validate negotiate info response buffer")
> a2d9daad1d2d ("SMB: fix validate negotiate info uninitialised memory use")
> 4587eee04e2a ("SMB3: Validate negotiate request must always be signed")
> a821df3f1af7 ("cifs: fix NULL deref in SMB2_read")
> 
> Does enabling CIFS_SMB311 changes anything?
> 

Thank you for looking into this. I tried applying these patches on top 
of 4.4.113 and 4.9.78, but that didn't fix the problem on either kernel,
with or without CONFIG_CIFS_SMB311 enabled.

(By the way, shouldn't these patches be applied to stable kernels anyway?
I was a bit surprised that none of them are present in 4.4.113 and 4.9.78).

> I also suspect some things assume encryption patches are in.
> 

Do you happen to know which patches they might be? In any case, I'm using
the latest (unmodified) 4.4 and 4.9 stable kernels, so I hope the necessary
support is already present in them.

The 5 patches you suggested above needed a bit of fixup by hand for 4.4.113,
so I have shared my combined patch below for reference, which applies
cleanly on top of 4.4.113. (The same patch applies on 4.9.78 as well, with
some minor line-number differences).

diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
index f2ff60e..92abb8b9 100644
--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -519,7 +519,7 @@ int smb3_validate_negotiate(const unsigned int xid, struct 
cifs_tcon *tcon)
 {
int rc = 0;
struct validate_negotiate_info_req vneg_inbuf;
-   struct validate_negotiate_info_rsp *pneg_rsp;
+   struct validate_negotiate_info_rsp *pneg_rsp = NULL;
u32 rsplen;
 
cifs_dbg(FYI, "validate negotiate\n");
@@ -575,8 +575,9 @@ int smb3_validate_negotiate(const unsigned int xid, struct 
cifs_tcon *tcon)
 rsplen);
 
/* relax check since Mac returns max bufsize allowed on ioctl */
-   if (rsplen > CIFSMaxBufSize)
-   return -EIO;
+   if ((rsplen > CIFSMaxBufSize)
+   || (rsplen < sizeof(struct 
validate_negotiate_info_rsp)))
+   goto err_rsp_free;
}
 
/* check validate negotiate info response matches what we got earlier */
@@ -595,10 +596,13 @@ int smb3_validate_negotiate(const unsigned int xid, 
struct cifs_tcon *tcon)
 
/* validate negotiate successful */
cifs_dbg(FYI, "validate negotiate info successful\n");
+   kfree(pneg_rsp);
return 0;
 
 vneg_out:
cifs_dbg(VFS, "protocol revalidation - security settings mismatch\n");
+err_rsp_free:
+   kfree(pneg_rsp);
return -EIO;
 }
 
@@ -1042,7 +1046,7 @@ tcon_exit:
return rc;
 
 tcon_error_exit:
-   if (rsp->hdr.Status == STATUS_BAD_NETWORK_NAME) {
+   if (rsp && rsp->hdr.Status == STATUS_BAD_NETWORK_NAME) {
cifs_dbg(VFS, "BAD_NETWORK_NAME: %s\n", tree);
}
goto tcon_exit;
@@ -1559,6 +1563,9 @@ SMB2_ioctl(const unsigned int xid, struct cifs_tcon 
*tcon, u64 persistent_fid,
} else
iov[0].iov_len = get_rfc1002_length(req) + 4;
 
+   /* validate negotiate request must be signed - see MS-SMB2 3.2.5.5 */
+   if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO)
+   req->hdr.Flags |= SMB2_FLAGS_SIGNED;
 
rc = SendReceive2(xid, ses, iov, num_iovecs, _buftype, 0);
rsp = (struct smb2_ioctl_rsp *)iov[0].iov_base;
@@ -2159,23 +2166,22 @@ SMB2_read(const unsigned int xid, struct cifs_io_parms 
*io_parms,
 
rsp = (struct smb2_read_rsp *)iov[0].iov_base;
 
-   if (rsp->hdr.Status == STATUS_END_OF_FILE) {
+   if (rc) {
+   if (rc != -ENODATA) {
+   cifs_stats_fail_inc(io_parms->tcon, SMB2_READ_HE);
+   cifs_dbg(VFS, "Send error in read = %d\n", rc);
+   }
free_rsp_buf(resp_buftype, iov[0].iov_base);
-   return 0;
+   return rc == -ENODATA ? 0 : rc;
}
 
-   if (rc) {
-   cifs_stats_fail_inc(io_parms->tcon, SMB2_READ_HE);
-   cifs_dbg(VFS, "Send error in read = %d\n", rc);
-   } else {
-   *nbytes = le32_to_cpu(rsp-&

Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed

2018-01-29 Thread Srivatsa S. Bhat
Hi Aurélien,

On 1/19/18 5:23 AM, Aurélien Aptel wrote:
> Hi,
> 
> "Srivatsa S. Bhat"  writes:
>>> Any thoughts on what is the right fix for stable kernels? Mounting SMB3
>>> shares works great on mainline (v4.15-rc5). It also works on 4.4.109 if
>>> I pass the sec=ntlmsspi option to the mount command (as opposed to the
>>> default: sec=ntlmssp). Please let me know if you need any other info.
> 
> Make sure you have (in that order):
> 
> db3b5474f462 ("CIFS: Fix NULL pointer deref on SMB2_tcon() failure")
> fe83bebc0522 ("SMB: fix leak of validate negotiate info response buffer")
> a2d9daad1d2d ("SMB: fix validate negotiate info uninitialised memory use")
> 4587eee04e2a ("SMB3: Validate negotiate request must always be signed")
> a821df3f1af7 ("cifs: fix NULL deref in SMB2_read")
> 
> Does enabling CIFS_SMB311 changes anything?
> 

Thank you for looking into this. I tried applying these patches on top 
of 4.4.113 and 4.9.78, but that didn't fix the problem on either kernel,
with or without CONFIG_CIFS_SMB311 enabled.

(By the way, shouldn't these patches be applied to stable kernels anyway?
I was a bit surprised that none of them are present in 4.4.113 and 4.9.78).

> I also suspect some things assume encryption patches are in.
> 

Do you happen to know which patches they might be? In any case, I'm using
the latest (unmodified) 4.4 and 4.9 stable kernels, so I hope the necessary
support is already present in them.

The 5 patches you suggested above needed a bit of fixup by hand for 4.4.113,
so I have shared my combined patch below for reference, which applies
cleanly on top of 4.4.113. (The same patch applies on 4.9.78 as well, with
some minor line-number differences).

diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
index f2ff60e..92abb8b9 100644
--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -519,7 +519,7 @@ int smb3_validate_negotiate(const unsigned int xid, struct 
cifs_tcon *tcon)
 {
int rc = 0;
struct validate_negotiate_info_req vneg_inbuf;
-   struct validate_negotiate_info_rsp *pneg_rsp;
+   struct validate_negotiate_info_rsp *pneg_rsp = NULL;
u32 rsplen;
 
cifs_dbg(FYI, "validate negotiate\n");
@@ -575,8 +575,9 @@ int smb3_validate_negotiate(const unsigned int xid, struct 
cifs_tcon *tcon)
 rsplen);
 
/* relax check since Mac returns max bufsize allowed on ioctl */
-   if (rsplen > CIFSMaxBufSize)
-   return -EIO;
+   if ((rsplen > CIFSMaxBufSize)
+   || (rsplen < sizeof(struct 
validate_negotiate_info_rsp)))
+   goto err_rsp_free;
}
 
/* check validate negotiate info response matches what we got earlier */
@@ -595,10 +596,13 @@ int smb3_validate_negotiate(const unsigned int xid, 
struct cifs_tcon *tcon)
 
/* validate negotiate successful */
cifs_dbg(FYI, "validate negotiate info successful\n");
+   kfree(pneg_rsp);
return 0;
 
 vneg_out:
cifs_dbg(VFS, "protocol revalidation - security settings mismatch\n");
+err_rsp_free:
+   kfree(pneg_rsp);
return -EIO;
 }
 
@@ -1042,7 +1046,7 @@ tcon_exit:
return rc;
 
 tcon_error_exit:
-   if (rsp->hdr.Status == STATUS_BAD_NETWORK_NAME) {
+   if (rsp && rsp->hdr.Status == STATUS_BAD_NETWORK_NAME) {
cifs_dbg(VFS, "BAD_NETWORK_NAME: %s\n", tree);
}
goto tcon_exit;
@@ -1559,6 +1563,9 @@ SMB2_ioctl(const unsigned int xid, struct cifs_tcon 
*tcon, u64 persistent_fid,
} else
iov[0].iov_len = get_rfc1002_length(req) + 4;
 
+   /* validate negotiate request must be signed - see MS-SMB2 3.2.5.5 */
+   if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO)
+   req->hdr.Flags |= SMB2_FLAGS_SIGNED;
 
rc = SendReceive2(xid, ses, iov, num_iovecs, _buftype, 0);
rsp = (struct smb2_ioctl_rsp *)iov[0].iov_base;
@@ -2159,23 +2166,22 @@ SMB2_read(const unsigned int xid, struct cifs_io_parms 
*io_parms,
 
rsp = (struct smb2_read_rsp *)iov[0].iov_base;
 
-   if (rsp->hdr.Status == STATUS_END_OF_FILE) {
+   if (rc) {
+   if (rc != -ENODATA) {
+   cifs_stats_fail_inc(io_parms->tcon, SMB2_READ_HE);
+   cifs_dbg(VFS, "Send error in read = %d\n", rc);
+   }
free_rsp_buf(resp_buftype, iov[0].iov_base);
-   return 0;
+   return rc == -ENODATA ? 0 : rc;
}
 
-   if (rc) {
-   cifs_stats_fail_inc(io_parms->tcon, SMB2_READ_HE);
-   cifs_dbg(VFS, "Send error in read = %d\n", rc);
-   } else {
-   *nbytes = le32_to_cpu(rsp->DataLength);
-   if 

Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed

2018-01-18 Thread Srivatsa S. Bhat
On 1/3/18 6:15 PM, Srivatsa S. Bhat wrote:
> On 11/1/17 8:18 AM, Greg Kroah-Hartman wrote:
>> On Tue, Oct 31, 2017 at 03:02:11PM +0200, Thomas Backlund wrote:
>>> Den 31.10.2017 kl. 11:55, skrev Greg Kroah-Hartman:
>>>> 4.13-stable review patch.  If anyone has any objections, please let me 
>>>> know.
>>>>
>>>> --
>>>>
>>>> From: Steve French <smfre...@gmail.com>
>>>>
>>>> commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd upstream.
>>>>
>>>> According to MS-SMB2 3.2.55 validate_negotiate request must
>>>> always be signed. Some Windows can fail the request if you send it unsigned
>>>>
>>>> See kernel bugzilla bug 197311
>>>>
>>>> Acked-by: Ronnie Sahlberg 
>>>> Signed-off-by: Steve French <smfre...@gmail.com>
>>>> Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
>>>>
>>>> ---
>>>>   fs/cifs/smb2pdu.c |3 +++
>>>>   1 file changed, 3 insertions(+)
>>>>
>>>> --- a/fs/cifs/smb2pdu.c
>>>> +++ b/fs/cifs/smb2pdu.c
>>>> @@ -1963,6 +1963,9 @@ SMB2_ioctl(const unsigned int xid, struc
>>>>} else
>>>>iov[0].iov_len = get_rfc1002_length(req) + 4;
>>>> +  /* validate negotiate request must be signed - see MS-SMB2 3.2.5.5 */
>>>> +  if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO)
>>>> +  req->hdr.sync_hdr.Flags |= SMB2_FLAGS_SIGNED;
>>>>rc = SendReceive2(xid, ses, iov, n_iov, _buftype, flags, _iov);
>>>>cifs_small_buf_release(req);
>>>>
>>>>
>>>>
>>>
>>> This one needs to be backported to all stable kernels as the commit that
>>> introduced the regression:
>>> '
>>> 0603c96f3af50e2f9299fa410c224ab1d465e0f9
>>> SMB: Validate negotiate (to protect against downgrade) even if signing off
>>>
>>> is backported in stable trees as of: 4.9.53, 4.4.90, 3.18.73
>>
>> Oh wait, it breaks the builds on older kernels, that's why I didn't
>> apply it :)
>>
>> Can you provide me with a working backport?
>>
> 
> Hi Steve,
> 
> Is there a version of this fix available for stable kernels?
> 

Any thoughts on this?

Regards,
Srivatsa

> I tried applying this patch to 4.4.109 (and a similar one for 4.9.74),
> but it didn't fix the problem.  Instead, I actually got a NULL pointer
> dereference when I tried to mount an SMB3 share.
> 
> Here is the patch I tried on 4.4.109:
> 
> diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
> index f2ff60e..3963bd2 100644
> --- a/fs/cifs/smb2pdu.c
> +++ b/fs/cifs/smb2pdu.c
> @@ -1559,6 +1559,9 @@ SMB2_ioctl(const unsigned int xid, struct cifs_tcon 
> *tcon, u64 persistent_fid,
> } else
> iov[0].iov_len = get_rfc1002_length(req) + 4;
>  
> +   /* validate negotiate request must be signed - see MS-SMB2 3.2.5.5 */
> +   if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO)
> +   req->hdr.Flags |= SMB2_FLAGS_SIGNED;
>  
> rc = SendReceive2(xid, ses, iov, num_iovecs, _buftype, 0);
> rsp = (struct smb2_ioctl_rsp *)iov[0].iov_base;
> 
> 
> This results in the following NULL pointer dereference when I try
> mounting:
> 
> # mount -vvv -t cifs -o vers=3.0,credentials=.smbcred ///TestSMB/ 
> testdir
> 
> [   53.073057] BUG: unable to handle kernel NULL pointer dereference at 
> 0050
> [   53.073511] IP: [] crypto_shash_setkey+0x1a/0xc0
> [   53.073973] PGD 0 
> [   53.074427] Oops:  [#1] SMP 
> [   53.074946] Modules linked in: arc4(E) ecb(E) md4(E) cifs(E) 
> dns_resolver(E) vmw_vsock_vmci_transport(E) vsock(E) hid_generic(E) usbhid(E) 
> hid(E) xt_conntrack(E) mousedev(E) iptable_nat(E) nf_conntrack_ipv4(E) 
> nf_defrag_ipv4(E) nf_nat_ipv4(E) nf_nat(E) iptable_filter(E) ip_tables(E) 
> crc32c_intel(E) xt_LOG(E) nf_conntrack(E) jitterentropy_rng(E) hmac(E) 
> sha256_ssse3(E) sha256_generic(E) drbg(E) vmw_balloon(E) ansi_cprng(E) 
> aesni_intel(E) aes_x86_64(E) glue_helper(E) lrw(E) gf128mul(E) ablk_helper(E) 
> cryptd(E) psmouse(E) evdev(E) uhci_hcd(E) ehci_pci(E) ehci_hcd(E) usbcore(E) 
> intel_agp(E) usb_common(E) vmw_vmci(E) i2c_piix4(E) intel_gtt(E) nfit(E) 
> battery(E) tpm_tis(E) tpm(E) ac(E) button(E) sch_fq_codel(E) autofs4(E)
> [   53.079435] CPU: 3 PID: 829 Comm: mount.cifs Tainted: GE   
> 4.4.109-possible-fix1+ #21
> [   53.079983] Hardware name: VMware, Inc. VMware Virtual Platform/440BX 
> Desktop Reference Platform, BIOS 6.00 04/05/2016
> 

Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed

2018-01-18 Thread Srivatsa S. Bhat
On 1/3/18 6:15 PM, Srivatsa S. Bhat wrote:
> On 11/1/17 8:18 AM, Greg Kroah-Hartman wrote:
>> On Tue, Oct 31, 2017 at 03:02:11PM +0200, Thomas Backlund wrote:
>>> Den 31.10.2017 kl. 11:55, skrev Greg Kroah-Hartman:
>>>> 4.13-stable review patch.  If anyone has any objections, please let me 
>>>> know.
>>>>
>>>> --
>>>>
>>>> From: Steve French 
>>>>
>>>> commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd upstream.
>>>>
>>>> According to MS-SMB2 3.2.55 validate_negotiate request must
>>>> always be signed. Some Windows can fail the request if you send it unsigned
>>>>
>>>> See kernel bugzilla bug 197311
>>>>
>>>> Acked-by: Ronnie Sahlberg 
>>>> Signed-off-by: Steve French 
>>>> Signed-off-by: Greg Kroah-Hartman 
>>>>
>>>> ---
>>>>   fs/cifs/smb2pdu.c |3 +++
>>>>   1 file changed, 3 insertions(+)
>>>>
>>>> --- a/fs/cifs/smb2pdu.c
>>>> +++ b/fs/cifs/smb2pdu.c
>>>> @@ -1963,6 +1963,9 @@ SMB2_ioctl(const unsigned int xid, struc
>>>>} else
>>>>iov[0].iov_len = get_rfc1002_length(req) + 4;
>>>> +  /* validate negotiate request must be signed - see MS-SMB2 3.2.5.5 */
>>>> +  if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO)
>>>> +  req->hdr.sync_hdr.Flags |= SMB2_FLAGS_SIGNED;
>>>>rc = SendReceive2(xid, ses, iov, n_iov, _buftype, flags, _iov);
>>>>cifs_small_buf_release(req);
>>>>
>>>>
>>>>
>>>
>>> This one needs to be backported to all stable kernels as the commit that
>>> introduced the regression:
>>> '
>>> 0603c96f3af50e2f9299fa410c224ab1d465e0f9
>>> SMB: Validate negotiate (to protect against downgrade) even if signing off
>>>
>>> is backported in stable trees as of: 4.9.53, 4.4.90, 3.18.73
>>
>> Oh wait, it breaks the builds on older kernels, that's why I didn't
>> apply it :)
>>
>> Can you provide me with a working backport?
>>
> 
> Hi Steve,
> 
> Is there a version of this fix available for stable kernels?
> 

Any thoughts on this?

Regards,
Srivatsa

> I tried applying this patch to 4.4.109 (and a similar one for 4.9.74),
> but it didn't fix the problem.  Instead, I actually got a NULL pointer
> dereference when I tried to mount an SMB3 share.
> 
> Here is the patch I tried on 4.4.109:
> 
> diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
> index f2ff60e..3963bd2 100644
> --- a/fs/cifs/smb2pdu.c
> +++ b/fs/cifs/smb2pdu.c
> @@ -1559,6 +1559,9 @@ SMB2_ioctl(const unsigned int xid, struct cifs_tcon 
> *tcon, u64 persistent_fid,
> } else
> iov[0].iov_len = get_rfc1002_length(req) + 4;
>  
> +   /* validate negotiate request must be signed - see MS-SMB2 3.2.5.5 */
> +   if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO)
> +   req->hdr.Flags |= SMB2_FLAGS_SIGNED;
>  
> rc = SendReceive2(xid, ses, iov, num_iovecs, _buftype, 0);
> rsp = (struct smb2_ioctl_rsp *)iov[0].iov_base;
> 
> 
> This results in the following NULL pointer dereference when I try
> mounting:
> 
> # mount -vvv -t cifs -o vers=3.0,credentials=.smbcred ///TestSMB/ 
> testdir
> 
> [   53.073057] BUG: unable to handle kernel NULL pointer dereference at 
> 0050
> [   53.073511] IP: [] crypto_shash_setkey+0x1a/0xc0
> [   53.073973] PGD 0 
> [   53.074427] Oops:  [#1] SMP 
> [   53.074946] Modules linked in: arc4(E) ecb(E) md4(E) cifs(E) 
> dns_resolver(E) vmw_vsock_vmci_transport(E) vsock(E) hid_generic(E) usbhid(E) 
> hid(E) xt_conntrack(E) mousedev(E) iptable_nat(E) nf_conntrack_ipv4(E) 
> nf_defrag_ipv4(E) nf_nat_ipv4(E) nf_nat(E) iptable_filter(E) ip_tables(E) 
> crc32c_intel(E) xt_LOG(E) nf_conntrack(E) jitterentropy_rng(E) hmac(E) 
> sha256_ssse3(E) sha256_generic(E) drbg(E) vmw_balloon(E) ansi_cprng(E) 
> aesni_intel(E) aes_x86_64(E) glue_helper(E) lrw(E) gf128mul(E) ablk_helper(E) 
> cryptd(E) psmouse(E) evdev(E) uhci_hcd(E) ehci_pci(E) ehci_hcd(E) usbcore(E) 
> intel_agp(E) usb_common(E) vmw_vmci(E) i2c_piix4(E) intel_gtt(E) nfit(E) 
> battery(E) tpm_tis(E) tpm(E) ac(E) button(E) sch_fq_codel(E) autofs4(E)
> [   53.079435] CPU: 3 PID: 829 Comm: mount.cifs Tainted: GE   
> 4.4.109-possible-fix1+ #21
> [   53.079983] Hardware name: VMware, Inc. VMware Virtual Platform/440BX 
> Desktop Reference Platform, BIOS 6.00 04/05/2016
> [   53.081086] task: 8800b4f41940 ti: 8800b92ac000 task.ti: 
> 8800b92ac0

Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed

2018-01-03 Thread Srivatsa S. Bhat
On 11/1/17 8:18 AM, Greg Kroah-Hartman wrote:
> On Tue, Oct 31, 2017 at 03:02:11PM +0200, Thomas Backlund wrote:
>> Den 31.10.2017 kl. 11:55, skrev Greg Kroah-Hartman:
>>> 4.13-stable review patch.  If anyone has any objections, please let me know.
>>>
>>> --
>>>
>>> From: Steve French 
>>>
>>> commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd upstream.
>>>
>>> According to MS-SMB2 3.2.55 validate_negotiate request must
>>> always be signed. Some Windows can fail the request if you send it unsigned
>>>
>>> See kernel bugzilla bug 197311
>>>
>>> Acked-by: Ronnie Sahlberg 
>>> Signed-off-by: Steve French 
>>> Signed-off-by: Greg Kroah-Hartman 
>>>
>>> ---
>>>   fs/cifs/smb2pdu.c |3 +++
>>>   1 file changed, 3 insertions(+)
>>>
>>> --- a/fs/cifs/smb2pdu.c
>>> +++ b/fs/cifs/smb2pdu.c
>>> @@ -1963,6 +1963,9 @@ SMB2_ioctl(const unsigned int xid, struc
>>> } else
>>> iov[0].iov_len = get_rfc1002_length(req) + 4;
>>> +   /* validate negotiate request must be signed - see MS-SMB2 3.2.5.5 */
>>> +   if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO)
>>> +   req->hdr.sync_hdr.Flags |= SMB2_FLAGS_SIGNED;
>>> rc = SendReceive2(xid, ses, iov, n_iov, _buftype, flags, _iov);
>>> cifs_small_buf_release(req);
>>>
>>>
>>>
>>
>> This one needs to be backported to all stable kernels as the commit that
>> introduced the regression:
>> '
>> 0603c96f3af50e2f9299fa410c224ab1d465e0f9
>> SMB: Validate negotiate (to protect against downgrade) even if signing off
>>
>> is backported in stable trees as of: 4.9.53, 4.4.90, 3.18.73
> 
> Oh wait, it breaks the builds on older kernels, that's why I didn't
> apply it :)
> 
> Can you provide me with a working backport?
> 

Hi Steve,

Is there a version of this fix available for stable kernels?

I tried applying this patch to 4.4.109 (and a similar one for 4.9.74),
but it didn't fix the problem.  Instead, I actually got a NULL pointer
dereference when I tried to mount an SMB3 share.

Here is the patch I tried on 4.4.109:

diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
index f2ff60e..3963bd2 100644
--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -1559,6 +1559,9 @@ SMB2_ioctl(const unsigned int xid, struct cifs_tcon 
*tcon, u64 persistent_fid,
} else
iov[0].iov_len = get_rfc1002_length(req) + 4;
 
+   /* validate negotiate request must be signed - see MS-SMB2 3.2.5.5 */
+   if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO)
+   req->hdr.Flags |= SMB2_FLAGS_SIGNED;
 
rc = SendReceive2(xid, ses, iov, num_iovecs, _buftype, 0);
rsp = (struct smb2_ioctl_rsp *)iov[0].iov_base;


This results in the following NULL pointer dereference when I try
mounting:

# mount -vvv -t cifs -o vers=3.0,credentials=.smbcred ///TestSMB/ 
testdir

[   53.073057] BUG: unable to handle kernel NULL pointer dereference at 
0050
[   53.073511] IP: [] crypto_shash_setkey+0x1a/0xc0
[   53.073973] PGD 0 
[   53.074427] Oops:  [#1] SMP 
[   53.074946] Modules linked in: arc4(E) ecb(E) md4(E) cifs(E) dns_resolver(E) 
vmw_vsock_vmci_transport(E) vsock(E) hid_generic(E) usbhid(E) hid(E) 
xt_conntrack(E) mousedev(E) iptable_nat(E) nf_conntrack_ipv4(E) 
nf_defrag_ipv4(E) nf_nat_ipv4(E) nf_nat(E) iptable_filter(E) ip_tables(E) 
crc32c_intel(E) xt_LOG(E) nf_conntrack(E) jitterentropy_rng(E) hmac(E) 
sha256_ssse3(E) sha256_generic(E) drbg(E) vmw_balloon(E) ansi_cprng(E) 
aesni_intel(E) aes_x86_64(E) glue_helper(E) lrw(E) gf128mul(E) ablk_helper(E) 
cryptd(E) psmouse(E) evdev(E) uhci_hcd(E) ehci_pci(E) ehci_hcd(E) usbcore(E) 
intel_agp(E) usb_common(E) vmw_vmci(E) i2c_piix4(E) intel_gtt(E) nfit(E) 
battery(E) tpm_tis(E) tpm(E) ac(E) button(E) sch_fq_codel(E) autofs4(E)
[   53.079435] CPU: 3 PID: 829 Comm: mount.cifs Tainted: GE   
4.4.109-possible-fix1+ #21
[   53.079983] Hardware name: VMware, Inc. VMware Virtual Platform/440BX 
Desktop Reference Platform, BIOS 6.00 04/05/2016
[   53.081086] task: 8800b4f41940 ti: 8800b92ac000 task.ti: 
8800b92ac000
[   53.081667] RIP: 0010:[]  [] 
crypto_shash_setkey+0x1a/0xc0
[   53.082247] RSP: 0018:8800b92af9a8  EFLAGS: 00010282
[   53.082604] systemd-journald[284]: Compressed data object 721 -> 468 using XZ
[   53.083419] RAX: 8800af5943c0 RBX: 8800b484a800 RCX: 0ec7
[   53.084001] RDX: 0010 RSI: 8800b900af18 RDI: 
[   53.084602] RBP: 8800b92af9e0 R08: 8800b92afb64 R09: 
[   53.085184] R10: 3031322e3030312e R11: 07f5 R12: 0002
[   53.085755] R13:  R14: 8800b900af18 R15: 0010
[   53.086333] FS:  7fb659b45740() GS:88013fcc() 
knlGS:
[   53.086907] CS:  0010 DS:  ES:  CR0: 80050033
[   53.087480] CR2: 0050 CR3: b797 CR4: 001606e0
[   53.088107] Stack:
[   53.088681]  

Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed

2018-01-03 Thread Srivatsa S. Bhat
On 11/1/17 8:18 AM, Greg Kroah-Hartman wrote:
> On Tue, Oct 31, 2017 at 03:02:11PM +0200, Thomas Backlund wrote:
>> Den 31.10.2017 kl. 11:55, skrev Greg Kroah-Hartman:
>>> 4.13-stable review patch.  If anyone has any objections, please let me know.
>>>
>>> --
>>>
>>> From: Steve French 
>>>
>>> commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd upstream.
>>>
>>> According to MS-SMB2 3.2.55 validate_negotiate request must
>>> always be signed. Some Windows can fail the request if you send it unsigned
>>>
>>> See kernel bugzilla bug 197311
>>>
>>> Acked-by: Ronnie Sahlberg 
>>> Signed-off-by: Steve French 
>>> Signed-off-by: Greg Kroah-Hartman 
>>>
>>> ---
>>>   fs/cifs/smb2pdu.c |3 +++
>>>   1 file changed, 3 insertions(+)
>>>
>>> --- a/fs/cifs/smb2pdu.c
>>> +++ b/fs/cifs/smb2pdu.c
>>> @@ -1963,6 +1963,9 @@ SMB2_ioctl(const unsigned int xid, struc
>>> } else
>>> iov[0].iov_len = get_rfc1002_length(req) + 4;
>>> +   /* validate negotiate request must be signed - see MS-SMB2 3.2.5.5 */
>>> +   if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO)
>>> +   req->hdr.sync_hdr.Flags |= SMB2_FLAGS_SIGNED;
>>> rc = SendReceive2(xid, ses, iov, n_iov, _buftype, flags, _iov);
>>> cifs_small_buf_release(req);
>>>
>>>
>>>
>>
>> This one needs to be backported to all stable kernels as the commit that
>> introduced the regression:
>> '
>> 0603c96f3af50e2f9299fa410c224ab1d465e0f9
>> SMB: Validate negotiate (to protect against downgrade) even if signing off
>>
>> is backported in stable trees as of: 4.9.53, 4.4.90, 3.18.73
> 
> Oh wait, it breaks the builds on older kernels, that's why I didn't
> apply it :)
> 
> Can you provide me with a working backport?
> 

Hi Steve,

Is there a version of this fix available for stable kernels?

I tried applying this patch to 4.4.109 (and a similar one for 4.9.74),
but it didn't fix the problem.  Instead, I actually got a NULL pointer
dereference when I tried to mount an SMB3 share.

Here is the patch I tried on 4.4.109:

diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
index f2ff60e..3963bd2 100644
--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -1559,6 +1559,9 @@ SMB2_ioctl(const unsigned int xid, struct cifs_tcon 
*tcon, u64 persistent_fid,
} else
iov[0].iov_len = get_rfc1002_length(req) + 4;
 
+   /* validate negotiate request must be signed - see MS-SMB2 3.2.5.5 */
+   if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO)
+   req->hdr.Flags |= SMB2_FLAGS_SIGNED;
 
rc = SendReceive2(xid, ses, iov, num_iovecs, _buftype, 0);
rsp = (struct smb2_ioctl_rsp *)iov[0].iov_base;


This results in the following NULL pointer dereference when I try
mounting:

# mount -vvv -t cifs -o vers=3.0,credentials=.smbcred ///TestSMB/ 
testdir

[   53.073057] BUG: unable to handle kernel NULL pointer dereference at 
0050
[   53.073511] IP: [] crypto_shash_setkey+0x1a/0xc0
[   53.073973] PGD 0 
[   53.074427] Oops:  [#1] SMP 
[   53.074946] Modules linked in: arc4(E) ecb(E) md4(E) cifs(E) dns_resolver(E) 
vmw_vsock_vmci_transport(E) vsock(E) hid_generic(E) usbhid(E) hid(E) 
xt_conntrack(E) mousedev(E) iptable_nat(E) nf_conntrack_ipv4(E) 
nf_defrag_ipv4(E) nf_nat_ipv4(E) nf_nat(E) iptable_filter(E) ip_tables(E) 
crc32c_intel(E) xt_LOG(E) nf_conntrack(E) jitterentropy_rng(E) hmac(E) 
sha256_ssse3(E) sha256_generic(E) drbg(E) vmw_balloon(E) ansi_cprng(E) 
aesni_intel(E) aes_x86_64(E) glue_helper(E) lrw(E) gf128mul(E) ablk_helper(E) 
cryptd(E) psmouse(E) evdev(E) uhci_hcd(E) ehci_pci(E) ehci_hcd(E) usbcore(E) 
intel_agp(E) usb_common(E) vmw_vmci(E) i2c_piix4(E) intel_gtt(E) nfit(E) 
battery(E) tpm_tis(E) tpm(E) ac(E) button(E) sch_fq_codel(E) autofs4(E)
[   53.079435] CPU: 3 PID: 829 Comm: mount.cifs Tainted: GE   
4.4.109-possible-fix1+ #21
[   53.079983] Hardware name: VMware, Inc. VMware Virtual Platform/440BX 
Desktop Reference Platform, BIOS 6.00 04/05/2016
[   53.081086] task: 8800b4f41940 ti: 8800b92ac000 task.ti: 
8800b92ac000
[   53.081667] RIP: 0010:[]  [] 
crypto_shash_setkey+0x1a/0xc0
[   53.082247] RSP: 0018:8800b92af9a8  EFLAGS: 00010282
[   53.082604] systemd-journald[284]: Compressed data object 721 -> 468 using XZ
[   53.083419] RAX: 8800af5943c0 RBX: 8800b484a800 RCX: 0ec7
[   53.084001] RDX: 0010 RSI: 8800b900af18 RDI: 
[   53.084602] RBP: 8800b92af9e0 R08: 8800b92afb64 R09: 
[   53.085184] R10: 3031322e3030312e R11: 07f5 R12: 0002
[   53.085755] R13:  R14: 8800b900af18 R15: 0010
[   53.086333] FS:  7fb659b45740() GS:88013fcc() 
knlGS:
[   53.086907] CS:  0010 DS:  ES:  CR0: 80050033
[   53.087480] CR2: 0050 CR3: b797 CR4: 001606e0
[   53.088107] Stack:
[   53.088681]  8800bba5d8c0 8800b92afa08 8800b484a800 
0002
[   

Re: [tip:smp/hotplug] cpu/hotplug: Restructure FROZEN state handling

2016-03-02 Thread Srivatsa S. Bhat
On 3/1/16 2:51 PM, tip-bot for Thomas Gleixner wrote:
> Commit-ID:  090e77c391dd983c8945b8e2e16d09f378d2e334
> Gitweb: http://git.kernel.org/tip/090e77c391dd983c8945b8e2e16d09f378d2e334
> Author: Thomas Gleixner <t...@linutronix.de>
> AuthorDate: Fri, 26 Feb 2016 18:43:23 +
> Committer:  Thomas Gleixner <t...@linutronix.de>
> CommitDate: Tue, 1 Mar 2016 20:36:53 +0100
> 
> cpu/hotplug: Restructure FROZEN state handling
> 
> There are only a few callbacks which really care about FROZEN
> vs. !FROZEN. No need to have extra states for this.
> 
> Publish the frozen state in an extra variable which is updated under
> the hotplug lock and let the users interested deal with it w/o
> imposing that extra state checks on everyone.
> 
> Signed-off-by: Thomas Gleixner <t...@linutronix.de>
> Cc: linux-a...@vger.kernel.org
> Cc: Rik van Riel <r...@redhat.com>
> Cc: Rafael Wysocki <rafael.j.wyso...@intel.com>
> Cc: "Srivatsa S. Bhat" <sriva...@mit.edu>
> Cc: Peter Zijlstra <pet...@infradead.org>
> Cc: Arjan van de Ven <ar...@linux.intel.com>
> Cc: Sebastian Siewior <bige...@linutronix.de>
> Cc: Rusty Russell <ru...@rustcorp.com.au>
> Cc: Steven Rostedt <rost...@goodmis.org>
> Cc: Oleg Nesterov <o...@redhat.com>
> Cc: Tejun Heo <t...@kernel.org>
> Cc: Andrew Morton <a...@linux-foundation.org>
> Cc: Paul McKenney <paul...@linux.vnet.ibm.com>
> Cc: Linus Torvalds <torva...@linux-foundation.org>
> Cc: Paul Turner <p...@google.com>
> Link: http://lkml.kernel.org/r/20160226182340.334912...@linutronix.de
> Signed-off-by: Thomas Gleixner <t...@linutronix.de>
> ---
>  include/linux/cpu.h |  2 ++
>  kernel/cpu.c| 69 
> ++---
>  2 files changed, 31 insertions(+), 40 deletions(-)
> 
> diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> index d2ca8c3..f2fb549 100644
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -118,6 +118,7 @@ enum {
>  
>  
>  #ifdef CONFIG_SMP
> +extern bool cpuhp_tasks_frozen;
>  /* Need to know about CPUs going up/down? */
>  #if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE)
>  #define cpu_notifier(fn, pri) {  \
> @@ -177,6 +178,7 @@ extern void cpu_maps_update_done(void);
>  #define cpu_notifier_register_done   cpu_maps_update_done
>  
>  #else/* CONFIG_SMP */
> +#define cpuhp_tasks_frozen   0
>  
>  #define cpu_notifier(fn, pri)do { (void)(fn); } while (0)
>  #define __cpu_notifier(fn, pri)  do { (void)(fn); } while (0)
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 5b9d396..41a6cb8 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -29,6 +29,8 @@
>  #ifdef CONFIG_SMP
>  /* Serializes the updates to cpu_online_mask, cpu_present_mask */
>  static DEFINE_MUTEX(cpu_add_remove_lock);
> +bool cpuhp_tasks_frozen;
> +EXPORT_SYMBOL_GPL(cpuhp_tasks_frozen);
>  

One small nitpick though: we don't need to export this symbol yet; it can
be deferred until the callbacks that need it are actually modified to use
this value (presumably in a later patchset).

Regards,
Srivatsa S. Bhat


Re: [tip:smp/hotplug] cpu/hotplug: Restructure FROZEN state handling

2016-03-02 Thread Srivatsa S. Bhat
On 3/1/16 2:51 PM, tip-bot for Thomas Gleixner wrote:
> Commit-ID:  090e77c391dd983c8945b8e2e16d09f378d2e334
> Gitweb: http://git.kernel.org/tip/090e77c391dd983c8945b8e2e16d09f378d2e334
> Author: Thomas Gleixner 
> AuthorDate: Fri, 26 Feb 2016 18:43:23 +
> Committer:  Thomas Gleixner 
> CommitDate: Tue, 1 Mar 2016 20:36:53 +0100
> 
> cpu/hotplug: Restructure FROZEN state handling
> 
> There are only a few callbacks which really care about FROZEN
> vs. !FROZEN. No need to have extra states for this.
> 
> Publish the frozen state in an extra variable which is updated under
> the hotplug lock and let the users interested deal with it w/o
> imposing that extra state checks on everyone.
> 
> Signed-off-by: Thomas Gleixner 
> Cc: linux-a...@vger.kernel.org
> Cc: Rik van Riel 
> Cc: Rafael Wysocki 
> Cc: "Srivatsa S. Bhat" 
> Cc: Peter Zijlstra 
> Cc: Arjan van de Ven 
> Cc: Sebastian Siewior 
> Cc: Rusty Russell 
> Cc: Steven Rostedt 
> Cc: Oleg Nesterov 
> Cc: Tejun Heo 
> Cc: Andrew Morton 
> Cc: Paul McKenney 
> Cc: Linus Torvalds 
> Cc: Paul Turner 
> Link: http://lkml.kernel.org/r/20160226182340.334912...@linutronix.de
> Signed-off-by: Thomas Gleixner 
> ---
>  include/linux/cpu.h |  2 ++
>  kernel/cpu.c| 69 
> ++---
>  2 files changed, 31 insertions(+), 40 deletions(-)
> 
> diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> index d2ca8c3..f2fb549 100644
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -118,6 +118,7 @@ enum {
>  
>  
>  #ifdef CONFIG_SMP
> +extern bool cpuhp_tasks_frozen;
>  /* Need to know about CPUs going up/down? */
>  #if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE)
>  #define cpu_notifier(fn, pri) {  \
> @@ -177,6 +178,7 @@ extern void cpu_maps_update_done(void);
>  #define cpu_notifier_register_done   cpu_maps_update_done
>  
>  #else/* CONFIG_SMP */
> +#define cpuhp_tasks_frozen   0
>  
>  #define cpu_notifier(fn, pri)do { (void)(fn); } while (0)
>  #define __cpu_notifier(fn, pri)  do { (void)(fn); } while (0)
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 5b9d396..41a6cb8 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -29,6 +29,8 @@
>  #ifdef CONFIG_SMP
>  /* Serializes the updates to cpu_online_mask, cpu_present_mask */
>  static DEFINE_MUTEX(cpu_add_remove_lock);
> +bool cpuhp_tasks_frozen;
> +EXPORT_SYMBOL_GPL(cpuhp_tasks_frozen);
>  

One small nitpick though: we don't need to export this symbol yet; it can
be deferred until the callbacks that need it are actually modified to use
this value (presumably in a later patchset).

Regards,
Srivatsa S. Bhat


Re: [tip:smp/hotplug] cpu/hotplug: Restructure FROZEN state handling

2016-03-02 Thread Srivatsa S. Bhat
On 3/1/16 2:51 PM, tip-bot for Thomas Gleixner wrote:
> Commit-ID:  090e77c391dd983c8945b8e2e16d09f378d2e334
> Gitweb: http://git.kernel.org/tip/090e77c391dd983c8945b8e2e16d09f378d2e334
> Author: Thomas Gleixner <t...@linutronix.de>
> AuthorDate: Fri, 26 Feb 2016 18:43:23 +
> Committer:  Thomas Gleixner <t...@linutronix.de>
> CommitDate: Tue, 1 Mar 2016 20:36:53 +0100
> 
> cpu/hotplug: Restructure FROZEN state handling
> 
> There are only a few callbacks which really care about FROZEN
> vs. !FROZEN. No need to have extra states for this.
> 
> Publish the frozen state in an extra variable which is updated under
> the hotplug lock and let the users interested deal with it w/o
> imposing that extra state checks on everyone.
> 
> Signed-off-by: Thomas Gleixner <t...@linutronix.de>
> Cc: linux-a...@vger.kernel.org
> Cc: Rik van Riel <r...@redhat.com>
> Cc: Rafael Wysocki <rafael.j.wyso...@intel.com>
> Cc: "Srivatsa S. Bhat" <sriva...@mit.edu>
> Cc: Peter Zijlstra <pet...@infradead.org>
> Cc: Arjan van de Ven <ar...@linux.intel.com>
> Cc: Sebastian Siewior <bige...@linutronix.de>
> Cc: Rusty Russell <ru...@rustcorp.com.au>
> Cc: Steven Rostedt <rost...@goodmis.org>
> Cc: Oleg Nesterov <o...@redhat.com>
> Cc: Tejun Heo <t...@kernel.org>
> Cc: Andrew Morton <a...@linux-foundation.org>
> Cc: Paul McKenney <paul...@linux.vnet.ibm.com>
> Cc: Linus Torvalds <torva...@linux-foundation.org>
> Cc: Paul Turner <p...@google.com>
> Link: http://lkml.kernel.org/r/20160226182340.334912...@linutronix.de
> Signed-off-by: Thomas Gleixner <t...@linutronix.de>
> ---

Reviewed-by: Srivatsa S. Bhat <sriva...@csail.mit.edu>

Regards,
Srivatsa S. Bhat

>  include/linux/cpu.h |  2 ++
>  kernel/cpu.c| 69 
> ++---
>  2 files changed, 31 insertions(+), 40 deletions(-)
> 
> diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> index d2ca8c3..f2fb549 100644
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -118,6 +118,7 @@ enum {
>  
>  
>  #ifdef CONFIG_SMP
> +extern bool cpuhp_tasks_frozen;
>  /* Need to know about CPUs going up/down? */
>  #if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE)
>  #define cpu_notifier(fn, pri) {  \
> @@ -177,6 +178,7 @@ extern void cpu_maps_update_done(void);
>  #define cpu_notifier_register_done   cpu_maps_update_done
>  
>  #else/* CONFIG_SMP */
> +#define cpuhp_tasks_frozen   0
>  
>  #define cpu_notifier(fn, pri)do { (void)(fn); } while (0)
>  #define __cpu_notifier(fn, pri)  do { (void)(fn); } while (0)
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 5b9d396..41a6cb8 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -29,6 +29,8 @@
>  #ifdef CONFIG_SMP
>  /* Serializes the updates to cpu_online_mask, cpu_present_mask */
>  static DEFINE_MUTEX(cpu_add_remove_lock);
> +bool cpuhp_tasks_frozen;
> +EXPORT_SYMBOL_GPL(cpuhp_tasks_frozen);
>  
>  /*
>   * The following two APIs (cpu_maps_update_begin/done) must be used when
> @@ -207,27 +209,30 @@ int __register_cpu_notifier(struct notifier_block *nb)
>   return raw_notifier_chain_register(_chain, nb);
>  }
>  
> -static int __cpu_notify(unsigned long val, void *v, int nr_to_call,
> +static int __cpu_notify(unsigned long val, unsigned int cpu, int nr_to_call,
>   int *nr_calls)
>  {
> + unsigned long mod = cpuhp_tasks_frozen ? CPU_TASKS_FROZEN : 0;
> + void *hcpu = (void *)(long)cpu;
> +
>   int ret;
>  
> - ret = __raw_notifier_call_chain(_chain, val, v, nr_to_call,
> + ret = __raw_notifier_call_chain(_chain, val | mod, hcpu, nr_to_call,
>   nr_calls);
>  
>   return notifier_to_errno(ret);
>  }
>  
> -static int cpu_notify(unsigned long val, void *v)
> +static int cpu_notify(unsigned long val, unsigned int cpu)
>  {
> - return __cpu_notify(val, v, -1, NULL);
> + return __cpu_notify(val, cpu, -1, NULL);
>  }
>  
>  #ifdef CONFIG_HOTPLUG_CPU
>  
> -static void cpu_notify_nofail(unsigned long val, void *v)
> +static void cpu_notify_nofail(unsigned long val, unsigned int cpu)
>  {
> - BUG_ON(cpu_notify(val, v));
> + BUG_ON(cpu_notify(val, cpu));
>  }
>  EXPORT_SYMBOL(register_cpu_notifier);
>  EXPORT_SYMBOL(__register_cpu_notifier);
> @@ -311,27 +316,21 @@ static inline void check_for_tasks(int dead_cpu)
>   read_unlock(_lock);
>  }
>  
> -struct take_cpu_down_param {
> - unsigned long mod;
> - void *hcpu;
> -}

Re: [tip:smp/hotplug] cpu/hotplug: Restructure FROZEN state handling

2016-03-02 Thread Srivatsa S. Bhat
On 3/1/16 2:51 PM, tip-bot for Thomas Gleixner wrote:
> Commit-ID:  090e77c391dd983c8945b8e2e16d09f378d2e334
> Gitweb: http://git.kernel.org/tip/090e77c391dd983c8945b8e2e16d09f378d2e334
> Author: Thomas Gleixner 
> AuthorDate: Fri, 26 Feb 2016 18:43:23 +
> Committer:  Thomas Gleixner 
> CommitDate: Tue, 1 Mar 2016 20:36:53 +0100
> 
> cpu/hotplug: Restructure FROZEN state handling
> 
> There are only a few callbacks which really care about FROZEN
> vs. !FROZEN. No need to have extra states for this.
> 
> Publish the frozen state in an extra variable which is updated under
> the hotplug lock and let the users interested deal with it w/o
> imposing that extra state checks on everyone.
> 
> Signed-off-by: Thomas Gleixner 
> Cc: linux-a...@vger.kernel.org
> Cc: Rik van Riel 
> Cc: Rafael Wysocki 
> Cc: "Srivatsa S. Bhat" 
> Cc: Peter Zijlstra 
> Cc: Arjan van de Ven 
> Cc: Sebastian Siewior 
> Cc: Rusty Russell 
> Cc: Steven Rostedt 
> Cc: Oleg Nesterov 
> Cc: Tejun Heo 
> Cc: Andrew Morton 
> Cc: Paul McKenney 
> Cc: Linus Torvalds 
> Cc: Paul Turner 
> Link: http://lkml.kernel.org/r/20160226182340.334912...@linutronix.de
> Signed-off-by: Thomas Gleixner 
> ---

Reviewed-by: Srivatsa S. Bhat 

Regards,
Srivatsa S. Bhat

>  include/linux/cpu.h |  2 ++
>  kernel/cpu.c| 69 
> ++---
>  2 files changed, 31 insertions(+), 40 deletions(-)
> 
> diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> index d2ca8c3..f2fb549 100644
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -118,6 +118,7 @@ enum {
>  
>  
>  #ifdef CONFIG_SMP
> +extern bool cpuhp_tasks_frozen;
>  /* Need to know about CPUs going up/down? */
>  #if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE)
>  #define cpu_notifier(fn, pri) {  \
> @@ -177,6 +178,7 @@ extern void cpu_maps_update_done(void);
>  #define cpu_notifier_register_done   cpu_maps_update_done
>  
>  #else/* CONFIG_SMP */
> +#define cpuhp_tasks_frozen   0
>  
>  #define cpu_notifier(fn, pri)do { (void)(fn); } while (0)
>  #define __cpu_notifier(fn, pri)  do { (void)(fn); } while (0)
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 5b9d396..41a6cb8 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -29,6 +29,8 @@
>  #ifdef CONFIG_SMP
>  /* Serializes the updates to cpu_online_mask, cpu_present_mask */
>  static DEFINE_MUTEX(cpu_add_remove_lock);
> +bool cpuhp_tasks_frozen;
> +EXPORT_SYMBOL_GPL(cpuhp_tasks_frozen);
>  
>  /*
>   * The following two APIs (cpu_maps_update_begin/done) must be used when
> @@ -207,27 +209,30 @@ int __register_cpu_notifier(struct notifier_block *nb)
>   return raw_notifier_chain_register(_chain, nb);
>  }
>  
> -static int __cpu_notify(unsigned long val, void *v, int nr_to_call,
> +static int __cpu_notify(unsigned long val, unsigned int cpu, int nr_to_call,
>   int *nr_calls)
>  {
> + unsigned long mod = cpuhp_tasks_frozen ? CPU_TASKS_FROZEN : 0;
> + void *hcpu = (void *)(long)cpu;
> +
>   int ret;
>  
> - ret = __raw_notifier_call_chain(_chain, val, v, nr_to_call,
> + ret = __raw_notifier_call_chain(_chain, val | mod, hcpu, nr_to_call,
>   nr_calls);
>  
>   return notifier_to_errno(ret);
>  }
>  
> -static int cpu_notify(unsigned long val, void *v)
> +static int cpu_notify(unsigned long val, unsigned int cpu)
>  {
> - return __cpu_notify(val, v, -1, NULL);
> + return __cpu_notify(val, cpu, -1, NULL);
>  }
>  
>  #ifdef CONFIG_HOTPLUG_CPU
>  
> -static void cpu_notify_nofail(unsigned long val, void *v)
> +static void cpu_notify_nofail(unsigned long val, unsigned int cpu)
>  {
> - BUG_ON(cpu_notify(val, v));
> + BUG_ON(cpu_notify(val, cpu));
>  }
>  EXPORT_SYMBOL(register_cpu_notifier);
>  EXPORT_SYMBOL(__register_cpu_notifier);
> @@ -311,27 +316,21 @@ static inline void check_for_tasks(int dead_cpu)
>   read_unlock(_lock);
>  }
>  
> -struct take_cpu_down_param {
> - unsigned long mod;
> - void *hcpu;
> -};
> -
>  /* Take this CPU down. */
>  static int take_cpu_down(void *_param)
>  {
> - struct take_cpu_down_param *param = _param;
> - int err;
> + int err, cpu = smp_processor_id();
>  
>   /* Ensure this CPU doesn't handle any more interrupts. */
>   err = __cpu_disable();
>   if (err < 0)
>   return err;
>  
> - cpu_notify(CPU_DYING | param->mod, param->hcpu);
> + cpu_notify(CPU_DYING, cpu);
>   /* Give up timekeeping duties */
>   tick_h

Re: [tip:smp/hotplug] cpu/hotplug: Split out cpu down functions

2016-03-02 Thread Srivatsa S. Bhat
On 3/1/16 2:52 PM, tip-bot for Thomas Gleixner wrote:
> Commit-ID:  984581728eb4b2e10baed3d606f85a119795b207
> Gitweb: http://git.kernel.org/tip/984581728eb4b2e10baed3d606f85a119795b207
> Author: Thomas Gleixner <t...@linutronix.de>
> AuthorDate: Fri, 26 Feb 2016 18:43:25 +
> Committer:  Thomas Gleixner <t...@linutronix.de>
> CommitDate: Tue, 1 Mar 2016 20:36:53 +0100
> 
> cpu/hotplug: Split out cpu down functions
> 
> Split cpu_down in separate functions in preparation for state machine
> conversion.
> 
> Signed-off-by: Thomas Gleixner <t...@linutronix.de>
> Cc: linux-a...@vger.kernel.org
> Cc: Rik van Riel <r...@redhat.com>
> Cc: Rafael Wysocki <rafael.j.wyso...@intel.com>
> Cc: "Srivatsa S. Bhat" <sriva...@mit.edu>
> Cc: Peter Zijlstra <pet...@infradead.org>
> Cc: Arjan van de Ven <ar...@linux.intel.com>
> Cc: Sebastian Siewior <bige...@linutronix.de>
> Cc: Rusty Russell <ru...@rustcorp.com.au>
> Cc: Steven Rostedt <rost...@goodmis.org>
> Cc: Oleg Nesterov <o...@redhat.com>
> Cc: Tejun Heo <t...@kernel.org>
> Cc: Andrew Morton <a...@linux-foundation.org>
> Cc: Paul McKenney <paul...@linux.vnet.ibm.com>
> Cc: Linus Torvalds <torva...@linux-foundation.org>
> Cc: Paul Turner <p...@google.com>
> Link: http://lkml.kernel.org/r/20160226182340.511796...@linutronix.de
> Signed-off-by: Thomas Gleixner <t...@linutronix.de>
> ---

Reviewed-by: Srivatsa S. Bhat <sriva...@csail.mit.edu>

Regards,
Srivatsa S. Bhat

>  kernel/cpu.c | 83 
> ++--
>  1 file changed, 53 insertions(+), 30 deletions(-)
> 
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 15a4136..0b5d259 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -266,11 +266,6 @@ static int bringup_cpu(unsigned int cpu)
>  }
>  
>  #ifdef CONFIG_HOTPLUG_CPU
> -
> -static void cpu_notify_nofail(unsigned long val, unsigned int cpu)
> -{
> - BUG_ON(cpu_notify(val, cpu));
> -}
>  EXPORT_SYMBOL(register_cpu_notifier);
>  EXPORT_SYMBOL(__register_cpu_notifier);
>  
> @@ -353,6 +348,25 @@ static inline void check_for_tasks(int dead_cpu)
>   read_unlock(_lock);
>  }
>  
> +static void cpu_notify_nofail(unsigned long val, unsigned int cpu)
> +{
> + BUG_ON(cpu_notify(val, cpu));
> +}
> +
> +static int notify_down_prepare(unsigned int cpu)
> +{
> + int err, nr_calls = 0;
> +
> + err = __cpu_notify(CPU_DOWN_PREPARE, cpu, -1, _calls);
> + if (err) {
> + nr_calls--;
> + __cpu_notify(CPU_DOWN_FAILED, cpu, nr_calls, NULL);
> + pr_warn("%s: attempt to take down CPU %u failed\n",
> + __func__, cpu);
> + }
> + return err;
> +}
> +
>  /* Take this CPU down. */
>  static int take_cpu_down(void *_param)
>  {
> @@ -371,29 +385,9 @@ static int take_cpu_down(void *_param)
>   return 0;
>  }
>  
> -/* Requires cpu_add_remove_lock to be held */
> -static int _cpu_down(unsigned int cpu, int tasks_frozen)
> +static int takedown_cpu(unsigned int cpu)
>  {
> - int err, nr_calls = 0;
> -
> - if (num_online_cpus() == 1)
> - return -EBUSY;
> -
> - if (!cpu_online(cpu))
> - return -EINVAL;
> -
> - cpu_hotplug_begin();
> -
> - cpuhp_tasks_frozen = tasks_frozen;
> -
> - err = __cpu_notify(CPU_DOWN_PREPARE, cpu, -1, _calls);
> - if (err) {
> - nr_calls--;
> - __cpu_notify(CPU_DOWN_FAILED, cpu, nr_calls, NULL);
> - pr_warn("%s: attempt to take down CPU %u failed\n",
> - __func__, cpu);
> - goto out_release;
> - }
> + int err;
>  
>   /*
>* By now we've cleared cpu_active_mask, wait for all preempt-disabled
> @@ -426,7 +420,7 @@ static int _cpu_down(unsigned int cpu, int tasks_frozen)
>   /* CPU didn't die: tell everyone.  Can't complain. */
>   cpu_notify_nofail(CPU_DOWN_FAILED, cpu);
>   irq_unlock_sparse();
> - goto out_release;
> + return err;
>   }
>   BUG_ON(cpu_online(cpu));
>  
> @@ -449,11 +443,40 @@ static int _cpu_down(unsigned int cpu, int tasks_frozen)
>   /* This actually kills the CPU. */
>   __cpu_die(cpu);
>  
> - /* CPU is completely dead: tell everyone.  Too late to complain. */
>   tick_cleanup_dead_cpu(cpu);
> - cpu_notify_nofail(CPU_DEAD, cpu);
> + return 0;
> +}
>  
> +static int notify_dead(unsigned int cpu)
> +{
> + cpu_notify_nofail(CPU_DEAD, cpu);

Re: [tip:smp/hotplug] cpu/hotplug: Split out cpu down functions

2016-03-02 Thread Srivatsa S. Bhat
On 3/1/16 2:52 PM, tip-bot for Thomas Gleixner wrote:
> Commit-ID:  984581728eb4b2e10baed3d606f85a119795b207
> Gitweb: http://git.kernel.org/tip/984581728eb4b2e10baed3d606f85a119795b207
> Author: Thomas Gleixner 
> AuthorDate: Fri, 26 Feb 2016 18:43:25 +
> Committer:  Thomas Gleixner 
> CommitDate: Tue, 1 Mar 2016 20:36:53 +0100
> 
> cpu/hotplug: Split out cpu down functions
> 
> Split cpu_down in separate functions in preparation for state machine
> conversion.
> 
> Signed-off-by: Thomas Gleixner 
> Cc: linux-a...@vger.kernel.org
> Cc: Rik van Riel 
> Cc: Rafael Wysocki 
> Cc: "Srivatsa S. Bhat" 
> Cc: Peter Zijlstra 
> Cc: Arjan van de Ven 
> Cc: Sebastian Siewior 
> Cc: Rusty Russell 
> Cc: Steven Rostedt 
> Cc: Oleg Nesterov 
> Cc: Tejun Heo 
> Cc: Andrew Morton 
> Cc: Paul McKenney 
> Cc: Linus Torvalds 
> Cc: Paul Turner 
> Link: http://lkml.kernel.org/r/20160226182340.511796...@linutronix.de
> Signed-off-by: Thomas Gleixner 
> ---

Reviewed-by: Srivatsa S. Bhat 

Regards,
Srivatsa S. Bhat

>  kernel/cpu.c | 83 
> ++--
>  1 file changed, 53 insertions(+), 30 deletions(-)
> 
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 15a4136..0b5d259 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -266,11 +266,6 @@ static int bringup_cpu(unsigned int cpu)
>  }
>  
>  #ifdef CONFIG_HOTPLUG_CPU
> -
> -static void cpu_notify_nofail(unsigned long val, unsigned int cpu)
> -{
> - BUG_ON(cpu_notify(val, cpu));
> -}
>  EXPORT_SYMBOL(register_cpu_notifier);
>  EXPORT_SYMBOL(__register_cpu_notifier);
>  
> @@ -353,6 +348,25 @@ static inline void check_for_tasks(int dead_cpu)
>   read_unlock(_lock);
>  }
>  
> +static void cpu_notify_nofail(unsigned long val, unsigned int cpu)
> +{
> + BUG_ON(cpu_notify(val, cpu));
> +}
> +
> +static int notify_down_prepare(unsigned int cpu)
> +{
> + int err, nr_calls = 0;
> +
> + err = __cpu_notify(CPU_DOWN_PREPARE, cpu, -1, _calls);
> + if (err) {
> + nr_calls--;
> + __cpu_notify(CPU_DOWN_FAILED, cpu, nr_calls, NULL);
> + pr_warn("%s: attempt to take down CPU %u failed\n",
> + __func__, cpu);
> + }
> + return err;
> +}
> +
>  /* Take this CPU down. */
>  static int take_cpu_down(void *_param)
>  {
> @@ -371,29 +385,9 @@ static int take_cpu_down(void *_param)
>   return 0;
>  }
>  
> -/* Requires cpu_add_remove_lock to be held */
> -static int _cpu_down(unsigned int cpu, int tasks_frozen)
> +static int takedown_cpu(unsigned int cpu)
>  {
> - int err, nr_calls = 0;
> -
> - if (num_online_cpus() == 1)
> - return -EBUSY;
> -
> - if (!cpu_online(cpu))
> - return -EINVAL;
> -
> - cpu_hotplug_begin();
> -
> - cpuhp_tasks_frozen = tasks_frozen;
> -
> - err = __cpu_notify(CPU_DOWN_PREPARE, cpu, -1, _calls);
> - if (err) {
> - nr_calls--;
> - __cpu_notify(CPU_DOWN_FAILED, cpu, nr_calls, NULL);
> - pr_warn("%s: attempt to take down CPU %u failed\n",
> - __func__, cpu);
> - goto out_release;
> - }
> + int err;
>  
>   /*
>* By now we've cleared cpu_active_mask, wait for all preempt-disabled
> @@ -426,7 +420,7 @@ static int _cpu_down(unsigned int cpu, int tasks_frozen)
>   /* CPU didn't die: tell everyone.  Can't complain. */
>   cpu_notify_nofail(CPU_DOWN_FAILED, cpu);
>   irq_unlock_sparse();
> - goto out_release;
> + return err;
>   }
>   BUG_ON(cpu_online(cpu));
>  
> @@ -449,11 +443,40 @@ static int _cpu_down(unsigned int cpu, int tasks_frozen)
>   /* This actually kills the CPU. */
>   __cpu_die(cpu);
>  
> - /* CPU is completely dead: tell everyone.  Too late to complain. */
>   tick_cleanup_dead_cpu(cpu);
> - cpu_notify_nofail(CPU_DEAD, cpu);
> + return 0;
> +}
>  
> +static int notify_dead(unsigned int cpu)
> +{
> + cpu_notify_nofail(CPU_DEAD, cpu);
>   check_for_tasks(cpu);
> + return 0;
> +}
> +
> +/* Requires cpu_add_remove_lock to be held */
> +static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
> +{
> + int err;
> +
> + if (num_online_cpus() == 1)
> + return -EBUSY;
> +
> + if (!cpu_online(cpu))
> + return -EINVAL;
> +
> + cpu_hotplug_begin();
> +
> + cpuhp_tasks_frozen = tasks_frozen;
> +
> + err = notify_down_prepare(cpu);
> + if (err)
> + goto out_release;
> + err = takedown_cpu(cpu);
> + if (err)
> + goto out_release;
> +
> + notify_dead(cpu);
>  
>  out_release:
>   cpu_hotplug_done();
> 



Re: [tip:smp/hotplug] cpu/hotplug: Restructure cpu_up code

2016-03-02 Thread Srivatsa S. Bhat
On 3/1/16 2:52 PM, tip-bot for Thomas Gleixner wrote:
> Commit-ID:  ba997462435f48ad1501320e9da8770fd40c59b1
> Gitweb: http://git.kernel.org/tip/ba997462435f48ad1501320e9da8770fd40c59b1
> Author: Thomas Gleixner <t...@linutronix.de>
> AuthorDate: Fri, 26 Feb 2016 18:43:24 +
> Committer:  Thomas Gleixner <t...@linutronix.de>
> CommitDate: Tue, 1 Mar 2016 20:36:53 +0100
> 
> cpu/hotplug: Restructure cpu_up code
> 
> Split out into separate functions, so we can convert it to a state machine.
> 
> Signed-off-by: Thomas Gleixner <t...@linutronix.de>
> Cc: linux-a...@vger.kernel.org
> Cc: Rik van Riel <r...@redhat.com>
> Cc: Rafael Wysocki <rafael.j.wyso...@intel.com>
> Cc: "Srivatsa S. Bhat" <sriva...@mit.edu>
> Cc: Peter Zijlstra <pet...@infradead.org>
> Cc: Arjan van de Ven <ar...@linux.intel.com>
> Cc: Sebastian Siewior <bige...@linutronix.de>
> Cc: Rusty Russell <ru...@rustcorp.com.au>
> Cc: Steven Rostedt <rost...@goodmis.org>
> Cc: Oleg Nesterov <o...@redhat.com>
> Cc: Tejun Heo <t...@kernel.org>
> Cc: Andrew Morton <a...@linux-foundation.org>
> Cc: Paul McKenney <paul...@linux.vnet.ibm.com>
> Cc: Linus Torvalds <torva...@linux-foundation.org>
> Cc: Paul Turner <p...@google.com>
> Link: http://lkml.kernel.org/r/20160226182340.429389...@linutronix.de
> Signed-off-by: Thomas Gleixner <t...@linutronix.de>
> ---

Reviewed-by: Srivatsa S. Bhat <sriva...@csail.mit.edu>

Regards,
Srivatsa S. Bhat

>  kernel/cpu.c | 69 
> +---
>  1 file changed, 47 insertions(+), 22 deletions(-)
> 
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 41a6cb8..15a4136 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -228,6 +228,43 @@ static int cpu_notify(unsigned long val, unsigned int 
> cpu)
>   return __cpu_notify(val, cpu, -1, NULL);
>  }
>  
> +/* Notifier wrappers for transitioning to state machine */
> +static int notify_prepare(unsigned int cpu)
> +{
> + int nr_calls = 0;
> + int ret;
> +
> + ret = __cpu_notify(CPU_UP_PREPARE, cpu, -1, _calls);
> + if (ret) {
> + nr_calls--;
> + printk(KERN_WARNING "%s: attempt to bring up CPU %u failed\n",
> + __func__, cpu);
> + __cpu_notify(CPU_UP_CANCELED, cpu, nr_calls, NULL);
> + }
> + return ret;
> +}
> +
> +static int notify_online(unsigned int cpu)
> +{
> + cpu_notify(CPU_ONLINE, cpu);
> + return 0;
> +}
> +
> +static int bringup_cpu(unsigned int cpu)
> +{
> + struct task_struct *idle = idle_thread_get(cpu);
> + int ret;
> +
> + /* Arch-specific enabling code. */
> + ret = __cpu_up(cpu, idle);
> + if (ret) {
> + cpu_notify(CPU_UP_CANCELED, cpu);
> + return ret;
> + }
> + BUG_ON(!cpu_online(cpu));
> + return 0;
> +}
> +
>  #ifdef CONFIG_HOTPLUG_CPU
>  
>  static void cpu_notify_nofail(unsigned long val, unsigned int cpu)
> @@ -481,7 +518,7 @@ void smpboot_thread_init(void)
>  static int _cpu_up(unsigned int cpu, int tasks_frozen)
>  {
>   struct task_struct *idle;
> - int ret, nr_calls = 0;
> + int ret;
>  
>   cpu_hotplug_begin();
>  
> @@ -496,33 +533,21 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen)
>   goto out;
>   }
>  
> + cpuhp_tasks_frozen = tasks_frozen;
> +
>   ret = smpboot_create_threads(cpu);
>   if (ret)
>   goto out;
>  
> - cpuhp_tasks_frozen = tasks_frozen;
> -
> - ret = __cpu_notify(CPU_UP_PREPARE, cpu, -1, _calls);
> - if (ret) {
> - nr_calls--;
> - pr_warn("%s: attempt to bring up CPU %u failed\n",
> - __func__, cpu);
> - goto out_notify;
> - }
> -
> - /* Arch-specific enabling code. */
> - ret = __cpu_up(cpu, idle);
> -
> - if (ret != 0)
> - goto out_notify;
> - BUG_ON(!cpu_online(cpu));
> + ret = notify_prepare(cpu);
> + if (ret)
> + goto out;
>  
> - /* Now call notifier in preparation. */
> - cpu_notify(CPU_ONLINE, cpu);
> + ret = bringup_cpu(cpu);
> + if (ret)
> + goto out;
>  
> -out_notify:
> - if (ret != 0)
> - __cpu_notify(CPU_UP_CANCELED, cpu, nr_calls, NULL);
> + notify_online(cpu);
>  out:
>   cpu_hotplug_done();
>  
> 



Re: [tip:smp/hotplug] cpu/hotplug: Restructure cpu_up code

2016-03-02 Thread Srivatsa S. Bhat
On 3/1/16 2:52 PM, tip-bot for Thomas Gleixner wrote:
> Commit-ID:  ba997462435f48ad1501320e9da8770fd40c59b1
> Gitweb: http://git.kernel.org/tip/ba997462435f48ad1501320e9da8770fd40c59b1
> Author: Thomas Gleixner 
> AuthorDate: Fri, 26 Feb 2016 18:43:24 +
> Committer:  Thomas Gleixner 
> CommitDate: Tue, 1 Mar 2016 20:36:53 +0100
> 
> cpu/hotplug: Restructure cpu_up code
> 
> Split out into separate functions, so we can convert it to a state machine.
> 
> Signed-off-by: Thomas Gleixner 
> Cc: linux-a...@vger.kernel.org
> Cc: Rik van Riel 
> Cc: Rafael Wysocki 
> Cc: "Srivatsa S. Bhat" 
> Cc: Peter Zijlstra 
> Cc: Arjan van de Ven 
> Cc: Sebastian Siewior 
> Cc: Rusty Russell 
> Cc: Steven Rostedt 
> Cc: Oleg Nesterov 
> Cc: Tejun Heo 
> Cc: Andrew Morton 
> Cc: Paul McKenney 
> Cc: Linus Torvalds 
> Cc: Paul Turner 
> Link: http://lkml.kernel.org/r/20160226182340.429389...@linutronix.de
> Signed-off-by: Thomas Gleixner 
> ---

Reviewed-by: Srivatsa S. Bhat 

Regards,
Srivatsa S. Bhat

>  kernel/cpu.c | 69 
> +---
>  1 file changed, 47 insertions(+), 22 deletions(-)
> 
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 41a6cb8..15a4136 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -228,6 +228,43 @@ static int cpu_notify(unsigned long val, unsigned int 
> cpu)
>   return __cpu_notify(val, cpu, -1, NULL);
>  }
>  
> +/* Notifier wrappers for transitioning to state machine */
> +static int notify_prepare(unsigned int cpu)
> +{
> + int nr_calls = 0;
> + int ret;
> +
> + ret = __cpu_notify(CPU_UP_PREPARE, cpu, -1, _calls);
> + if (ret) {
> + nr_calls--;
> + printk(KERN_WARNING "%s: attempt to bring up CPU %u failed\n",
> + __func__, cpu);
> + __cpu_notify(CPU_UP_CANCELED, cpu, nr_calls, NULL);
> + }
> + return ret;
> +}
> +
> +static int notify_online(unsigned int cpu)
> +{
> + cpu_notify(CPU_ONLINE, cpu);
> + return 0;
> +}
> +
> +static int bringup_cpu(unsigned int cpu)
> +{
> + struct task_struct *idle = idle_thread_get(cpu);
> + int ret;
> +
> + /* Arch-specific enabling code. */
> + ret = __cpu_up(cpu, idle);
> + if (ret) {
> + cpu_notify(CPU_UP_CANCELED, cpu);
> + return ret;
> + }
> + BUG_ON(!cpu_online(cpu));
> + return 0;
> +}
> +
>  #ifdef CONFIG_HOTPLUG_CPU
>  
>  static void cpu_notify_nofail(unsigned long val, unsigned int cpu)
> @@ -481,7 +518,7 @@ void smpboot_thread_init(void)
>  static int _cpu_up(unsigned int cpu, int tasks_frozen)
>  {
>   struct task_struct *idle;
> - int ret, nr_calls = 0;
> + int ret;
>  
>   cpu_hotplug_begin();
>  
> @@ -496,33 +533,21 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen)
>   goto out;
>   }
>  
> + cpuhp_tasks_frozen = tasks_frozen;
> +
>   ret = smpboot_create_threads(cpu);
>   if (ret)
>   goto out;
>  
> - cpuhp_tasks_frozen = tasks_frozen;
> -
> - ret = __cpu_notify(CPU_UP_PREPARE, cpu, -1, _calls);
> - if (ret) {
> - nr_calls--;
> - pr_warn("%s: attempt to bring up CPU %u failed\n",
> - __func__, cpu);
> - goto out_notify;
> - }
> -
> - /* Arch-specific enabling code. */
> - ret = __cpu_up(cpu, idle);
> -
> - if (ret != 0)
> - goto out_notify;
> - BUG_ON(!cpu_online(cpu));
> + ret = notify_prepare(cpu);
> + if (ret)
> + goto out;
>  
> - /* Now call notifier in preparation. */
> - cpu_notify(CPU_ONLINE, cpu);
> + ret = bringup_cpu(cpu);
> + if (ret)
> + goto out;
>  
> -out_notify:
> - if (ret != 0)
> - __cpu_notify(CPU_UP_CANCELED, cpu, nr_calls, NULL);
> + notify_online(cpu);
>  out:
>   cpu_hotplug_done();
>  
> 



Re: [tip:sched/core] irq_work: Remove BUG_ON in irq_work_run()

2014-07-17 Thread Srivatsa S. Bhat

Hi Ingo,

On 07/05/2014 04:13 PM, tip-bot for Peter Zijlstra wrote:
> Commit-ID:  a77353e5eb56b6c6098bfce59aff1f449451b0b7
> Gitweb: http://git.kernel.org/tip/a77353e5eb56b6c6098bfce59aff1f449451b0b7
> Author: Peter Zijlstra 
> AuthorDate: Wed, 25 Jun 2014 07:13:07 +0200
> Committer:  Ingo Molnar 
> CommitDate: Sat, 5 Jul 2014 11:17:26 +0200
> 
> irq_work: Remove BUG_ON in irq_work_run()
>

I believe this fix has to go into 3.16 itself, since this fixes a CPU hotplug
regression on many systems, as reported here:

https://lkml.org/lkml/2014/6/24/765
https://lkml.org/lkml/2014/7/1/473
https://lkml.org/lkml/2014/7/4/16

I didn't find this fix in mainline yet, so I thought of sending a note.

Thank you!

Regards,
Srivatsa S. Bhat
 
> Because of a collision with 8d056c48e486 ("CPU hotplug, smp: flush any
> pending IPI callbacks before CPU offline"), which ends up calling
> hotplug_cfd()->flush_smp_call_function_queue()->irq_work_run(), which
> is not from IRQ context.
> 
> And since that already calls irq_work_run() from the hotplug path,
> remove our entire hotplug handling.
> 
> Reported-by: Stephen Warren 
> Tested-by: Stephen Warren 
> Reviewed-by: Srivatsa S. Bhat 
> Cc: Frederic Weisbecker 
> Cc: Linus Torvalds 
> Signed-off-by: Peter Zijlstra 
> Link: http://lkml.kernel.org/n/tip-busatzs2gvz4v62258agi...@git.kernel.org
> Signed-off-by: Ingo Molnar 
> ---
>  kernel/irq_work.c | 46 --
>  1 file changed, 4 insertions(+), 42 deletions(-)
> 
> diff --git a/kernel/irq_work.c b/kernel/irq_work.c
> index 4b0a890..e6bcbe7 100644
> --- a/kernel/irq_work.c
> +++ b/kernel/irq_work.c
> @@ -160,20 +160,14 @@ static void irq_work_run_list(struct llist_head *list)
>   }
>  }
> 
> -static void __irq_work_run(void)
> -{
> - irq_work_run_list(&__get_cpu_var(raised_list));
> - irq_work_run_list(&__get_cpu_var(lazy_list));
> -}
> -
>  /*
> - * Run the irq_work entries on this cpu. Requires to be ran from hardirq
> - * context with local IRQs disabled.
> + * hotplug calls this through:
> + *  hotplug_cfd() -> flush_smp_call_function_queue()
>   */
>  void irq_work_run(void)
>  {
> - BUG_ON(!in_irq());
> - __irq_work_run();
> + irq_work_run_list(&__get_cpu_var(raised_list));
> + irq_work_run_list(&__get_cpu_var(lazy_list));
>  }
>  EXPORT_SYMBOL_GPL(irq_work_run);
> 
> @@ -189,35 +183,3 @@ void irq_work_sync(struct irq_work *work)
>   cpu_relax();
>  }
>  EXPORT_SYMBOL_GPL(irq_work_sync);
> -
> -#ifdef CONFIG_HOTPLUG_CPU
> -static int irq_work_cpu_notify(struct notifier_block *self,
> -unsigned long action, void *hcpu)
> -{
> - long cpu = (long)hcpu;
> -
> - switch (action) {
> - case CPU_DYING:
> - /* Called from stop_machine */
> - if (WARN_ON_ONCE(cpu != smp_processor_id()))
> - break;
> - __irq_work_run();
> - break;
> - default:
> - break;
> - }
> - return NOTIFY_OK;
> -}
> -
> -static struct notifier_block cpu_notify;
> -
> -static __init int irq_work_init_cpu_notifier(void)
> -{
> - cpu_notify.notifier_call = irq_work_cpu_notify;
> - cpu_notify.priority = 0;
> - register_cpu_notifier(_notify);
> - return 0;
> -}
> -device_initcall(irq_work_init_cpu_notifier);
> -
> -#endif /* CONFIG_HOTPLUG_CPU */
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [tip:sched/core] irq_work: Remove BUG_ON in irq_work_run()

2014-07-17 Thread Srivatsa S. Bhat

Hi Ingo,

On 07/05/2014 04:13 PM, tip-bot for Peter Zijlstra wrote:
 Commit-ID:  a77353e5eb56b6c6098bfce59aff1f449451b0b7
 Gitweb: http://git.kernel.org/tip/a77353e5eb56b6c6098bfce59aff1f449451b0b7
 Author: Peter Zijlstra pet...@infradead.org
 AuthorDate: Wed, 25 Jun 2014 07:13:07 +0200
 Committer:  Ingo Molnar mi...@kernel.org
 CommitDate: Sat, 5 Jul 2014 11:17:26 +0200
 
 irq_work: Remove BUG_ON in irq_work_run()


I believe this fix has to go into 3.16 itself, since this fixes a CPU hotplug
regression on many systems, as reported here:

https://lkml.org/lkml/2014/6/24/765
https://lkml.org/lkml/2014/7/1/473
https://lkml.org/lkml/2014/7/4/16

I didn't find this fix in mainline yet, so I thought of sending a note.

Thank you!

Regards,
Srivatsa S. Bhat
 
 Because of a collision with 8d056c48e486 (CPU hotplug, smp: flush any
 pending IPI callbacks before CPU offline), which ends up calling
 hotplug_cfd()-flush_smp_call_function_queue()-irq_work_run(), which
 is not from IRQ context.
 
 And since that already calls irq_work_run() from the hotplug path,
 remove our entire hotplug handling.
 
 Reported-by: Stephen Warren swar...@wwwdotorg.org
 Tested-by: Stephen Warren swar...@wwwdotorg.org
 Reviewed-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
 Cc: Frederic Weisbecker fweis...@gmail.com
 Cc: Linus Torvalds torva...@linux-foundation.org
 Signed-off-by: Peter Zijlstra pet...@infradead.org
 Link: http://lkml.kernel.org/n/tip-busatzs2gvz4v62258agi...@git.kernel.org
 Signed-off-by: Ingo Molnar mi...@kernel.org
 ---
  kernel/irq_work.c | 46 --
  1 file changed, 4 insertions(+), 42 deletions(-)
 
 diff --git a/kernel/irq_work.c b/kernel/irq_work.c
 index 4b0a890..e6bcbe7 100644
 --- a/kernel/irq_work.c
 +++ b/kernel/irq_work.c
 @@ -160,20 +160,14 @@ static void irq_work_run_list(struct llist_head *list)
   }
  }
 
 -static void __irq_work_run(void)
 -{
 - irq_work_run_list(__get_cpu_var(raised_list));
 - irq_work_run_list(__get_cpu_var(lazy_list));
 -}
 -
  /*
 - * Run the irq_work entries on this cpu. Requires to be ran from hardirq
 - * context with local IRQs disabled.
 + * hotplug calls this through:
 + *  hotplug_cfd() - flush_smp_call_function_queue()
   */
  void irq_work_run(void)
  {
 - BUG_ON(!in_irq());
 - __irq_work_run();
 + irq_work_run_list(__get_cpu_var(raised_list));
 + irq_work_run_list(__get_cpu_var(lazy_list));
  }
  EXPORT_SYMBOL_GPL(irq_work_run);
 
 @@ -189,35 +183,3 @@ void irq_work_sync(struct irq_work *work)
   cpu_relax();
  }
  EXPORT_SYMBOL_GPL(irq_work_sync);
 -
 -#ifdef CONFIG_HOTPLUG_CPU
 -static int irq_work_cpu_notify(struct notifier_block *self,
 -unsigned long action, void *hcpu)
 -{
 - long cpu = (long)hcpu;
 -
 - switch (action) {
 - case CPU_DYING:
 - /* Called from stop_machine */
 - if (WARN_ON_ONCE(cpu != smp_processor_id()))
 - break;
 - __irq_work_run();
 - break;
 - default:
 - break;
 - }
 - return NOTIFY_OK;
 -}
 -
 -static struct notifier_block cpu_notify;
 -
 -static __init int irq_work_init_cpu_notifier(void)
 -{
 - cpu_notify.notifier_call = irq_work_cpu_notify;
 - cpu_notify.priority = 0;
 - register_cpu_notifier(cpu_notify);
 - return 0;
 -}
 -device_initcall(irq_work_init_cpu_notifier);
 -
 -#endif /* CONFIG_HOTPLUG_CPU */
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 1/2] cpufreq: Don't destroy/realloc policy/sysfs on hotplug/suspend

2014-07-16 Thread Srivatsa S. Bhat
On 07/16/2014 06:43 PM, Viresh Kumar wrote:
> On 16 July 2014 16:46, Srivatsa S. Bhat  wrote:
>> Short answer: If the sysfs directory has already been created by cpufreq,
>> then yes, it will remain as it is. However, if the online operation failed
>> before that, then cpufreq won't know about that CPU at all, and no file will
>> be created.
>>
>> Long answer:
>> The existing cpufreq code does all its work (including creating the sysfs
>> directories etc) at the CPU_ONLINE stage. This stage is not expected to fail
>> (in fact even the core CPU hotplug code in kernel/cpu.c doesn't care for
>> error returns at this point). So if a CPU fails to come up in earlier stages
>> itself (such as CPU_UP_PREPARE), then cpufreq won't even hear about that CPU,
>> and hence no sysfs files will be created/linked. However, if the CPU bringup
>> operation fails during the CPU_ONLINE stage after the cpufreq's notifier has
>> been invoked, then we do nothing about it and the cpufreq sysfs files will
>> remain.
> 
> In short, the problem I mentioned before this para is genuine. And setting
> policy->cpu to the first cpu of a mask is indeed a bad idea.
> 
>>> Also, how does suspend/resume work without CONFIG_HOTPLUG_CPU ?
>>> What's the sequence of events?
>>>
>>
>> Well, CONFIG_SUSPEND doesn't have an explicit dependency on HOTPLUG_CPU, but
>> SMP systems usually use CONFIG_PM_SLEEP_SMP, which sets CONFIG_HOTPLUG_CPU.
> 
> I read usually as *optional*
> 
>> (I guess the reason why CONFIG_SUSPEND doesn't depend on HOTPLUG_CPU is
>> because suspend is possible even on uniprocessor systems and hence the
>> Kconfig dependency wasn't really justified).
> 
> Again the same question, how do we suspend when HOTPLUG is disabled?
> 

>From what I understand, if you disable HOTPLUG_CPU and enable CONFIG_SUSPEND
and try suspend/resume on an SMP system, the disable_nonboot_cpus() call will
return silently without doing anything. Thus, suspend will fail silently and
the system might have trouble resuming.

But surprisingly we have never had such bug reports so far! Most probably this
is because PM_SLEEP_SMP has a default of y (which in turn selects HOTPLUG_CPU):

config PM_SLEEP_SMP
def_bool y
depends on SMP
depends on ARCH_SUSPEND_POSSIBLE || ARCH_HIBERNATION_POSSIBLE
depends on PM_SLEEP
select HOTPLUG_CPU

So I guess nobody really tried turning this off on SMP systems and then trying
suspend. Then I started looking at the git history and wondered where this
Kconfig dependency between SUSPEND and SMP<->HOTPLUG_CPU got messed up. But
instead I found that the initial commit itself didn't get the dependency right.

Commit 296699de6bdc (Introduce CONFIG_SUSPEND for suspend-to-Ram and standby)
introduced all the Kconfig options, and it indeed mentions this in the
changelog: "Make HOTPLUG_CPU be selected automatically if SUSPEND or
HIBERNATION has been chosen and the kernel is intended for SMP systems". But
unfortunately, the code didn't get it right because it made CONFIG_SUSPEND
depend on SUSPEND_SMP_POSSIBLE instead of SUSPEND_SMP.

In other words, we have had this incorrect dependency all the time!

Regards,
Srivatsa S. Bhat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 1/2] cpufreq: Don't destroy/realloc policy/sysfs on hotplug/suspend

2014-07-16 Thread Srivatsa S. Bhat
 various places in the code. Otherwise it becomes very hard to follow
your thought-flow just by looking at the patch. So please split up the patch
further and also make the changelogs useful to review the patch :-)

The link that Viresh gave above also did a lot of code reorganization in
cpufreq, so it should give you a good example of how to proceed.

[...]

>> __cpufreq_add_dev(dev, NULL);
>> break;
>>
>> case CPU_DOWN_PREPARE:
>> -   __cpufreq_remove_dev_prepare(dev, NULL);
>> -   break;
>> -
>> -   case CPU_POST_DEAD:
>> -   __cpufreq_remove_dev_finish(dev, NULL);
>> -   break;
>> -
>> -   case CPU_DOWN_FAILED:
>> -   __cpufreq_add_dev(dev, NULL);
>> +   __cpufreq_remove_dev(dev, NULL);
> 
> @Srivatsa: You might want to have a look at this, remove sequence was
> separated for some purpose and I am just not able to concentrate enough
> to think of that, just too many cases running in my mind :)
> 

Yeah, we had split it into _remove_dev_prepare() and _remove_dev_finish()
to avoid a few potential deadlocks. We wanted to call _remove_dev_prepare()
in the DOWN_PREPARE stage and then call _remove_dev_finish() (which waits
for the kobject refcount to drop) in the POST_DEAD stage. That is, we wanted
to do the kobject cleanup after releasing the hotplug lock, and POST_DEAD stage
was well-suited for that.

Commit 1aee40ac9c8 (cpufreq: Invoke __cpufreq_remove_dev_finish() after
releasing cpu_hotplug.lock) explains this in detail. Saravana, please take a
look at that reasoning and ensure that your patch doesn't re-introduce those
deadlock possibilities!

>> break;
>> }
>> }
> 
> I am still not sure if everything will work as expected as I seriously doubt
> my reviewing capabilities. There might be corner cases which I am still
> missing.
> 

Regards,
Srivatsa S. Bhat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] cpufreq: Don't destroy/realloc policy/sysfs on hotplug/suspend

2014-07-16 Thread Srivatsa S. Bhat
On 07/16/2014 11:14 AM, Viresh Kumar wrote:
> On 15 July 2014 12:28, Srivatsa S. Bhat  wrote:
>> Wait, allowing an offline CPU to be the policy->cpu (i.e., the CPU which is
>> considered as the master of the policy/group) is just absurd.
> 
> Yeah, that was as Absurd as I am :)
> 

I have had my own share of silly ideas over the years; so don't worry, we are
all in the same boat ;-)

>> The goal of this patchset should be to just de-couple the sysfs 
>> files/ownership
>> from the policy->cpu to an extent where it doesn't matter who owns those
>> files, and probably make it easier to do CPU hotplug without having to
>> destroy and recreate the files on every hotplug operation.
> 
> I went to that Absurd idea because we thought we can skip playing with
> the sysfs nodes on suspend/hotplug.
> 
> And if policy->cpu keeps changing with hotplug, we *may* have to keep
> sysfs stuff moving as well. One way to avoid that is by using something
> like: policy->sysfs_cpu, but wasn't sure if that's the right path to follow.
>

Hmm, I understand.. Even I don't have any suggestions as of now, since I
haven't spent enough time thinking of alternatives yet.

> Lets see what Saravana's new patchset has for us :)
> 

Yep :-)

Regards,
Srivatsa S. Bhat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] cpufreq: Don't destroy/realloc policy/sysfs on hotplug/suspend

2014-07-16 Thread Srivatsa S. Bhat
On 07/15/2014 11:05 PM, skan...@codeaurora.org wrote:
> 
> Srivatsa S. Bhat wrote:
>> On 07/15/2014 11:06 AM, Saravana Kannan wrote:
>>> On 07/14/2014 09:35 PM, Viresh Kumar wrote:
>>>> On 15 July 2014 00:38, Saravana Kannan  wrote:
>>>>> Yeah, it definitely crashes if policy->cpu if an offline cpu. Because
>>>>> the
>>>>> mutex would be uninitialized if it's stopped after boot or it would
>>>>> never
>>>>> have been initialized (depending on how you fix policy->cpu at boot).
>>>>>
>>>>> Look at this snippet on the actual tree and it should be pretty
>>>>> evident.
>>>>
>>>> Yeah, I missed it. So the problem is we initialize timer_mutex's for
>>>> policy->cpus. So we need to do that just for policy->cpu and also we
>>>> don't
>>>> need a per-cpu timer_mutex anymore.
>>>>
>>>
>>> Btw, I tried to take a stab at removing any assumption in cpufreq code
>>> about policy->cpu being ONLINE.
>>
>> Wait, allowing an offline CPU to be the policy->cpu (i.e., the CPU which
>> is
>> considered as the master of the policy/group) is just absurd. If there is
>> no leader, there is no army. We should NOT sacrifice sane semantics for
>> the
>> sake of simplifying the code.
>>
>>> There are 160 instances of those of with
>>> 23 are in cpufreq.c
>>>
>>
>> And that explains why. It is just *natural* to assume that the CPUs
>> governed
>> by a policy are online. Especially so for the CPU which is supposed to be
>> the policy leader. Let us please not change that - it will become
>> counter-intuitive if we do so. [ The other reason is that physical hotplug
>> is also possible on some systems... in that case your code might make a
>> CPU
>> which is not even present (but possible) as the policy->cpu.. and great
>> 'fun'
>> will ensue after that ;-( ]
>>
>> The goal of this patchset should be to just de-couple the sysfs
>> files/ownership
>> from the policy->cpu to an extent where it doesn't matter who owns those
>> files, and probably make it easier to do CPU hotplug without having to
>> destroy and recreate the files on every hotplug operation.
>>
>> This is exactly why the _implementation_ matters in this particular case -
>> if we can't achieve the simplification by keeping sane semantics, then we
>> shouldn't do the simplification!
>>
>> That said, I think we should keep trying - we haven't exhausted all ideas
>> yet :-)
>>
> 
> I don't think we disagree. To summarize this topic: I tried to keep the
> policy->cpu an actual online CPU so as to not break existing semantics in
> this patch. Viresh asked "why not fix it at boot?". My response was to
> keep it an online CPU and give it a shot in a separate patch if we really
> want that. It's too risky to do that in this patch and also not a
> mandatory change for this patch.
> 
> I think we can work out the details on the need to fixing policy->cpu at
> boot and whether there's even a need for policy->cpu (when we already have
> policy->cpus) in a separate thread after the dust settles on this one?
>

Sure, that sounds good!

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] cpufreq: Don't destroy/realloc policy/sysfs on hotplug/suspend

2014-07-16 Thread Srivatsa S. Bhat
On 07/15/2014 11:05 PM, skan...@codeaurora.org wrote:
 
 Srivatsa S. Bhat wrote:
 On 07/15/2014 11:06 AM, Saravana Kannan wrote:
 On 07/14/2014 09:35 PM, Viresh Kumar wrote:
 On 15 July 2014 00:38, Saravana Kannan skan...@codeaurora.org wrote:
 Yeah, it definitely crashes if policy-cpu if an offline cpu. Because
 the
 mutex would be uninitialized if it's stopped after boot or it would
 never
 have been initialized (depending on how you fix policy-cpu at boot).

 Look at this snippet on the actual tree and it should be pretty
 evident.

 Yeah, I missed it. So the problem is we initialize timer_mutex's for
 policy-cpus. So we need to do that just for policy-cpu and also we
 don't
 need a per-cpu timer_mutex anymore.


 Btw, I tried to take a stab at removing any assumption in cpufreq code
 about policy-cpu being ONLINE.

 Wait, allowing an offline CPU to be the policy-cpu (i.e., the CPU which
 is
 considered as the master of the policy/group) is just absurd. If there is
 no leader, there is no army. We should NOT sacrifice sane semantics for
 the
 sake of simplifying the code.

 There are 160 instances of those of with
 23 are in cpufreq.c


 And that explains why. It is just *natural* to assume that the CPUs
 governed
 by a policy are online. Especially so for the CPU which is supposed to be
 the policy leader. Let us please not change that - it will become
 counter-intuitive if we do so. [ The other reason is that physical hotplug
 is also possible on some systems... in that case your code might make a
 CPU
 which is not even present (but possible) as the policy-cpu.. and great
 'fun'
 will ensue after that ;-( ]

 The goal of this patchset should be to just de-couple the sysfs
 files/ownership
 from the policy-cpu to an extent where it doesn't matter who owns those
 files, and probably make it easier to do CPU hotplug without having to
 destroy and recreate the files on every hotplug operation.

 This is exactly why the _implementation_ matters in this particular case -
 if we can't achieve the simplification by keeping sane semantics, then we
 shouldn't do the simplification!

 That said, I think we should keep trying - we haven't exhausted all ideas
 yet :-)

 
 I don't think we disagree. To summarize this topic: I tried to keep the
 policy-cpu an actual online CPU so as to not break existing semantics in
 this patch. Viresh asked why not fix it at boot?. My response was to
 keep it an online CPU and give it a shot in a separate patch if we really
 want that. It's too risky to do that in this patch and also not a
 mandatory change for this patch.
 
 I think we can work out the details on the need to fixing policy-cpu at
 boot and whether there's even a need for policy-cpu (when we already have
 policy-cpus) in a separate thread after the dust settles on this one?


Sure, that sounds good!

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] cpufreq: Don't destroy/realloc policy/sysfs on hotplug/suspend

2014-07-16 Thread Srivatsa S. Bhat
On 07/16/2014 11:14 AM, Viresh Kumar wrote:
 On 15 July 2014 12:28, Srivatsa S. Bhat sriva...@mit.edu wrote:
 Wait, allowing an offline CPU to be the policy-cpu (i.e., the CPU which is
 considered as the master of the policy/group) is just absurd.
 
 Yeah, that was as Absurd as I am :)
 

I have had my own share of silly ideas over the years; so don't worry, we are
all in the same boat ;-)

 The goal of this patchset should be to just de-couple the sysfs 
 files/ownership
 from the policy-cpu to an extent where it doesn't matter who owns those
 files, and probably make it easier to do CPU hotplug without having to
 destroy and recreate the files on every hotplug operation.
 
 I went to that Absurd idea because we thought we can skip playing with
 the sysfs nodes on suspend/hotplug.
 
 And if policy-cpu keeps changing with hotplug, we *may* have to keep
 sysfs stuff moving as well. One way to avoid that is by using something
 like: policy-sysfs_cpu, but wasn't sure if that's the right path to follow.


Hmm, I understand.. Even I don't have any suggestions as of now, since I
haven't spent enough time thinking of alternatives yet.

 Lets see what Saravana's new patchset has for us :)
 

Yep :-)

Regards,
Srivatsa S. Bhat
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 1/2] cpufreq: Don't destroy/realloc policy/sysfs on hotplug/suspend

2014-07-16 Thread Srivatsa S. Bhat
.

[...]

 __cpufreq_add_dev(dev, NULL);
 break;

 case CPU_DOWN_PREPARE:
 -   __cpufreq_remove_dev_prepare(dev, NULL);
 -   break;
 -
 -   case CPU_POST_DEAD:
 -   __cpufreq_remove_dev_finish(dev, NULL);
 -   break;
 -
 -   case CPU_DOWN_FAILED:
 -   __cpufreq_add_dev(dev, NULL);
 +   __cpufreq_remove_dev(dev, NULL);
 
 @Srivatsa: You might want to have a look at this, remove sequence was
 separated for some purpose and I am just not able to concentrate enough
 to think of that, just too many cases running in my mind :)
 

Yeah, we had split it into _remove_dev_prepare() and _remove_dev_finish()
to avoid a few potential deadlocks. We wanted to call _remove_dev_prepare()
in the DOWN_PREPARE stage and then call _remove_dev_finish() (which waits
for the kobject refcount to drop) in the POST_DEAD stage. That is, we wanted
to do the kobject cleanup after releasing the hotplug lock, and POST_DEAD stage
was well-suited for that.

Commit 1aee40ac9c8 (cpufreq: Invoke __cpufreq_remove_dev_finish() after
releasing cpu_hotplug.lock) explains this in detail. Saravana, please take a
look at that reasoning and ensure that your patch doesn't re-introduce those
deadlock possibilities!

 break;
 }
 }
 
 I am still not sure if everything will work as expected as I seriously doubt
 my reviewing capabilities. There might be corner cases which I am still
 missing.
 

Regards,
Srivatsa S. Bhat
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 1/2] cpufreq: Don't destroy/realloc policy/sysfs on hotplug/suspend

2014-07-16 Thread Srivatsa S. Bhat
On 07/16/2014 06:43 PM, Viresh Kumar wrote:
 On 16 July 2014 16:46, Srivatsa S. Bhat sriva...@mit.edu wrote:
 Short answer: If the sysfs directory has already been created by cpufreq,
 then yes, it will remain as it is. However, if the online operation failed
 before that, then cpufreq won't know about that CPU at all, and no file will
 be created.

 Long answer:
 The existing cpufreq code does all its work (including creating the sysfs
 directories etc) at the CPU_ONLINE stage. This stage is not expected to fail
 (in fact even the core CPU hotplug code in kernel/cpu.c doesn't care for
 error returns at this point). So if a CPU fails to come up in earlier stages
 itself (such as CPU_UP_PREPARE), then cpufreq won't even hear about that CPU,
 and hence no sysfs files will be created/linked. However, if the CPU bringup
 operation fails during the CPU_ONLINE stage after the cpufreq's notifier has
 been invoked, then we do nothing about it and the cpufreq sysfs files will
 remain.
 
 In short, the problem I mentioned before this para is genuine. And setting
 policy-cpu to the first cpu of a mask is indeed a bad idea.
 
 Also, how does suspend/resume work without CONFIG_HOTPLUG_CPU ?
 What's the sequence of events?


 Well, CONFIG_SUSPEND doesn't have an explicit dependency on HOTPLUG_CPU, but
 SMP systems usually use CONFIG_PM_SLEEP_SMP, which sets CONFIG_HOTPLUG_CPU.
 
 I read usually as *optional*
 
 (I guess the reason why CONFIG_SUSPEND doesn't depend on HOTPLUG_CPU is
 because suspend is possible even on uniprocessor systems and hence the
 Kconfig dependency wasn't really justified).
 
 Again the same question, how do we suspend when HOTPLUG is disabled?
 

From what I understand, if you disable HOTPLUG_CPU and enable CONFIG_SUSPEND
and try suspend/resume on an SMP system, the disable_nonboot_cpus() call will
return silently without doing anything. Thus, suspend will fail silently and
the system might have trouble resuming.

But surprisingly we have never had such bug reports so far! Most probably this
is because PM_SLEEP_SMP has a default of y (which in turn selects HOTPLUG_CPU):

config PM_SLEEP_SMP
def_bool y
depends on SMP
depends on ARCH_SUSPEND_POSSIBLE || ARCH_HIBERNATION_POSSIBLE
depends on PM_SLEEP
select HOTPLUG_CPU

So I guess nobody really tried turning this off on SMP systems and then trying
suspend. Then I started looking at the git history and wondered where this
Kconfig dependency between SUSPEND and SMP-HOTPLUG_CPU got messed up. But
instead I found that the initial commit itself didn't get the dependency right.

Commit 296699de6bdc (Introduce CONFIG_SUSPEND for suspend-to-Ram and standby)
introduced all the Kconfig options, and it indeed mentions this in the
changelog: Make HOTPLUG_CPU be selected automatically if SUSPEND or
HIBERNATION has been chosen and the kernel is intended for SMP systems. But
unfortunately, the code didn't get it right because it made CONFIG_SUSPEND
depend on SUSPEND_SMP_POSSIBLE instead of SUSPEND_SMP.

In other words, we have had this incorrect dependency all the time!

Regards,
Srivatsa S. Bhat
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] cpufreq: Don't destroy/realloc policy/sysfs on hotplug/suspend

2014-07-15 Thread Srivatsa S. Bhat
On 07/15/2014 11:06 AM, Saravana Kannan wrote:
> On 07/14/2014 09:35 PM, Viresh Kumar wrote:
>> On 15 July 2014 00:38, Saravana Kannan  wrote:
>>> Yeah, it definitely crashes if policy->cpu if an offline cpu. Because
>>> the
>>> mutex would be uninitialized if it's stopped after boot or it would
>>> never
>>> have been initialized (depending on how you fix policy->cpu at boot).
>>>
>>> Look at this snippet on the actual tree and it should be pretty evident.
>>
>> Yeah, I missed it. So the problem is we initialize timer_mutex's for
>> policy->cpus. So we need to do that just for policy->cpu and also we
>> don't
>> need a per-cpu timer_mutex anymore.
>>
> 
> Btw, I tried to take a stab at removing any assumption in cpufreq code
> about policy->cpu being ONLINE.

Wait, allowing an offline CPU to be the policy->cpu (i.e., the CPU which is
considered as the master of the policy/group) is just absurd. If there is
no leader, there is no army. We should NOT sacrifice sane semantics for the
sake of simplifying the code.

> There are 160 instances of those of with
> 23 are in cpufreq.c
>

And that explains why. It is just *natural* to assume that the CPUs governed
by a policy are online. Especially so for the CPU which is supposed to be
the policy leader. Let us please not change that - it will become
counter-intuitive if we do so. [ The other reason is that physical hotplug
is also possible on some systems... in that case your code might make a CPU
which is not even present (but possible) as the policy->cpu.. and great 'fun'
will ensue after that ;-( ]

The goal of this patchset should be to just de-couple the sysfs files/ownership
from the policy->cpu to an extent where it doesn't matter who owns those
files, and probably make it easier to do CPU hotplug without having to
destroy and recreate the files on every hotplug operation.

This is exactly why the _implementation_ matters in this particular case -
if we can't achieve the simplification by keeping sane semantics, then we
shouldn't do the simplification!

That said, I think we should keep trying - we haven't exhausted all ideas
yet :-)

Regards,
Srivatsa S. Bhat

> So, even if we are sure cpufreq.c is fine, it's 137 other uses spread
> across all the other files. I definitely don't want to try and fix those
> as part of this patch. Way too risky and hard to get the test coverage
> it would need. Even some of the acpi cpufreq drivers seem to be making
> this assumption.
> 
> Btw, I think v3 is done. I did some testing and it was fine. But made
> some minor changes. Will test tomorrow to make sure I didn't break
> anything with the minor changes and then send them out.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] cpufreq: Don't destroy/realloc policy/sysfs on hotplug/suspend

2014-07-15 Thread Srivatsa S. Bhat
On 07/15/2014 11:06 AM, Saravana Kannan wrote:
 On 07/14/2014 09:35 PM, Viresh Kumar wrote:
 On 15 July 2014 00:38, Saravana Kannan skan...@codeaurora.org wrote:
 Yeah, it definitely crashes if policy-cpu if an offline cpu. Because
 the
 mutex would be uninitialized if it's stopped after boot or it would
 never
 have been initialized (depending on how you fix policy-cpu at boot).

 Look at this snippet on the actual tree and it should be pretty evident.

 Yeah, I missed it. So the problem is we initialize timer_mutex's for
 policy-cpus. So we need to do that just for policy-cpu and also we
 don't
 need a per-cpu timer_mutex anymore.

 
 Btw, I tried to take a stab at removing any assumption in cpufreq code
 about policy-cpu being ONLINE.

Wait, allowing an offline CPU to be the policy-cpu (i.e., the CPU which is
considered as the master of the policy/group) is just absurd. If there is
no leader, there is no army. We should NOT sacrifice sane semantics for the
sake of simplifying the code.

 There are 160 instances of those of with
 23 are in cpufreq.c


And that explains why. It is just *natural* to assume that the CPUs governed
by a policy are online. Especially so for the CPU which is supposed to be
the policy leader. Let us please not change that - it will become
counter-intuitive if we do so. [ The other reason is that physical hotplug
is also possible on some systems... in that case your code might make a CPU
which is not even present (but possible) as the policy-cpu.. and great 'fun'
will ensue after that ;-( ]

The goal of this patchset should be to just de-couple the sysfs files/ownership
from the policy-cpu to an extent where it doesn't matter who owns those
files, and probably make it easier to do CPU hotplug without having to
destroy and recreate the files on every hotplug operation.

This is exactly why the _implementation_ matters in this particular case -
if we can't achieve the simplification by keeping sane semantics, then we
shouldn't do the simplification!

That said, I think we should keep trying - we haven't exhausted all ideas
yet :-)

Regards,
Srivatsa S. Bhat

 So, even if we are sure cpufreq.c is fine, it's 137 other uses spread
 across all the other files. I definitely don't want to try and fix those
 as part of this patch. Way too risky and hard to get the test coverage
 it would need. Even some of the acpi cpufreq drivers seem to be making
 this assumption.
 
 Btw, I think v3 is done. I did some testing and it was fine. But made
 some minor changes. Will test tomorrow to make sure I didn't break
 anything with the minor changes and then send them out.
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] cpufreq: Don't destroy/realloc policy/sysfs on hotplug/suspend

2014-07-11 Thread Srivatsa S. Bhat
On 07/11/2014 09:48 AM, Saravana Kannan wrote:
> The CPUfreq driver moves the cpufreq policy ownership between CPUs when
> CPUs within a cluster (CPUs sharing same policy) go ONLINE/OFFLINE. When
> moving policy ownership between CPUs, it also moves the cpufreq sysfs
> directory between CPUs and also fixes up the symlinks of the other CPUs in
> the cluster.
> 
> Also, when all the CPUs in a cluster go OFFLINE, all the sysfs nodes and
> directories are deleted, the kobject is released and the policy is freed.
> And when the first CPU in a cluster comes up, the policy is reallocated and
> initialized, kobject is acquired, the sysfs nodes are created or symlinked,
> etc.
> 
> All these steps end up creating unnecessarily complicated code and locking.
> There's no real benefit to adding/removing/moving the sysfs nodes and the
> policy between CPUs. Other per CPU sysfs directories like power and cpuidle
> are left alone during hotplug. So there's some precedence to what this
> patch is trying to do.
> 
> This patch simplifies a lot of the code and locking by removing the
> adding/removing/moving of policy/sysfs/kobj and just leaves the cpufreq
> directory and policy in place irrespective of whether the CPUs are
> ONLINE/OFFLINE.
> 
> Leaving the policy, sysfs and kobject in place also brings these additional
> benefits:
> * Faster suspend/resume.
> * Faster hotplug.
> * Sysfs file permissions maintained across hotplug without userspace
>   workarounds.
> * Policy settings and governor tunables maintained across suspend/resume
>   and hotplug.
> * Cpufreq stats would be maintained across hotplug for all CPUs and can be
>   queried even after CPU goes OFFLINE.
> 
> Change-Id: I39c395e1fee8731880c0fd7c8a9c1d83e2e4b8d0
> Tested-by: Stephen Boyd 
> Signed-off-by: Saravana Kannan 
> ---
> 
> Preliminary testing has been done. cpufreq directories are getting created
> properly. Online/offline of CPUs work. Policies remain unmodifiable from
> userspace when all policy CPUs are offline.
> 
> Error handling code has NOT been updated.
> 
> I've added a bunch of FIXME comments next to where I'm not sure about the
> locking in the existing code. I believe most of the try_lock's were present
> to prevent a deadlock between sysfs lock and the cpufreq locks. Now that
> the sysfs entries are not touched after creating them, we should be able to
> replace most/all of these try_lock's with a normal lock.
> 
> This patch has more room for code simplification, but I would like to get
> some acks for the functionality and this code before I do further
> simplification.
> 

The idea behind this work is very welcome indeed! IMHO, there is nothing
conceptually wrong in maintaining the per-cpu sysfs files across CPU hotplug
(as long as we take care to return appropriate error codes if userspace
tries to set values using the control files of offline CPUs). So, it really
boils down to whether or not we get the implementation right; the idea itself
looks fine as of now. Hence, your efforts in making this patch(set) easier to
review will certainly help. Perhaps you can simplify the code later, but at
this point, splitting up this patch into multiple smaller, reviewable pieces
(accompanied by well-written changelogs that explain the intent) is the utmost
priority. Just like Viresh, even I had a hard time reviewing all of this in
one go.

Thank you for taking up this work!

Regards,
Srivatsa S. Bhat

> I should also be able to remove get_online_cpus() in the store function and
> replace it with just a check for policy->governor_enabled. That should
> theoretically reduce some contention between cpufreq stats check and
> hotplug of unrelated CPUs.
> 
> Appreciate all the feedback.
> 
> Thanks,
> Saravana
> 
>  drivers/cpufreq/cpufreq.c | 331 
> ++
>  1 file changed, 69 insertions(+), 262 deletions(-)
> 
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 62259d2..e350b15 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -859,16 +859,16 @@ void cpufreq_sysfs_remove_file(const struct attribute 
> *attr)
>  }
>  EXPORT_SYMBOL(cpufreq_sysfs_remove_file);
> 
> -/* symlink affected CPUs */
> +/* symlink related CPUs */
>  static int cpufreq_add_dev_symlink(struct cpufreq_policy *policy)
>  {
> - unsigned int j;
> + unsigned int j, first_cpu = cpumask_first(policy->related_cpus);
>   int ret = 0;
> 
> - for_each_cpu(j, policy->cpus) {
> + for_each_cpu(j, policy->related_cpus) {
>   struct device *cpu_dev;
> 
> - if (j == policy->cpu)
> + if (j == first_cpu)
>   continue;
> 
>

Re: [PATCH v2] cpufreq: Don't destroy/realloc policy/sysfs on hotplug/suspend

2014-07-11 Thread Srivatsa S. Bhat
On 07/11/2014 09:48 AM, Saravana Kannan wrote:
 The CPUfreq driver moves the cpufreq policy ownership between CPUs when
 CPUs within a cluster (CPUs sharing same policy) go ONLINE/OFFLINE. When
 moving policy ownership between CPUs, it also moves the cpufreq sysfs
 directory between CPUs and also fixes up the symlinks of the other CPUs in
 the cluster.
 
 Also, when all the CPUs in a cluster go OFFLINE, all the sysfs nodes and
 directories are deleted, the kobject is released and the policy is freed.
 And when the first CPU in a cluster comes up, the policy is reallocated and
 initialized, kobject is acquired, the sysfs nodes are created or symlinked,
 etc.
 
 All these steps end up creating unnecessarily complicated code and locking.
 There's no real benefit to adding/removing/moving the sysfs nodes and the
 policy between CPUs. Other per CPU sysfs directories like power and cpuidle
 are left alone during hotplug. So there's some precedence to what this
 patch is trying to do.
 
 This patch simplifies a lot of the code and locking by removing the
 adding/removing/moving of policy/sysfs/kobj and just leaves the cpufreq
 directory and policy in place irrespective of whether the CPUs are
 ONLINE/OFFLINE.
 
 Leaving the policy, sysfs and kobject in place also brings these additional
 benefits:
 * Faster suspend/resume.
 * Faster hotplug.
 * Sysfs file permissions maintained across hotplug without userspace
   workarounds.
 * Policy settings and governor tunables maintained across suspend/resume
   and hotplug.
 * Cpufreq stats would be maintained across hotplug for all CPUs and can be
   queried even after CPU goes OFFLINE.
 
 Change-Id: I39c395e1fee8731880c0fd7c8a9c1d83e2e4b8d0
 Tested-by: Stephen Boyd sb...@codeaurora.org
 Signed-off-by: Saravana Kannan skan...@codeaurora.org
 ---
 
 Preliminary testing has been done. cpufreq directories are getting created
 properly. Online/offline of CPUs work. Policies remain unmodifiable from
 userspace when all policy CPUs are offline.
 
 Error handling code has NOT been updated.
 
 I've added a bunch of FIXME comments next to where I'm not sure about the
 locking in the existing code. I believe most of the try_lock's were present
 to prevent a deadlock between sysfs lock and the cpufreq locks. Now that
 the sysfs entries are not touched after creating them, we should be able to
 replace most/all of these try_lock's with a normal lock.
 
 This patch has more room for code simplification, but I would like to get
 some acks for the functionality and this code before I do further
 simplification.
 

The idea behind this work is very welcome indeed! IMHO, there is nothing
conceptually wrong in maintaining the per-cpu sysfs files across CPU hotplug
(as long as we take care to return appropriate error codes if userspace
tries to set values using the control files of offline CPUs). So, it really
boils down to whether or not we get the implementation right; the idea itself
looks fine as of now. Hence, your efforts in making this patch(set) easier to
review will certainly help. Perhaps you can simplify the code later, but at
this point, splitting up this patch into multiple smaller, reviewable pieces
(accompanied by well-written changelogs that explain the intent) is the utmost
priority. Just like Viresh, even I had a hard time reviewing all of this in
one go.

Thank you for taking up this work!

Regards,
Srivatsa S. Bhat

 I should also be able to remove get_online_cpus() in the store function and
 replace it with just a check for policy-governor_enabled. That should
 theoretically reduce some contention between cpufreq stats check and
 hotplug of unrelated CPUs.
 
 Appreciate all the feedback.
 
 Thanks,
 Saravana
 
  drivers/cpufreq/cpufreq.c | 331 
 ++
  1 file changed, 69 insertions(+), 262 deletions(-)
 
 diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
 index 62259d2..e350b15 100644
 --- a/drivers/cpufreq/cpufreq.c
 +++ b/drivers/cpufreq/cpufreq.c
 @@ -859,16 +859,16 @@ void cpufreq_sysfs_remove_file(const struct attribute 
 *attr)
  }
  EXPORT_SYMBOL(cpufreq_sysfs_remove_file);
 
 -/* symlink affected CPUs */
 +/* symlink related CPUs */
  static int cpufreq_add_dev_symlink(struct cpufreq_policy *policy)
  {
 - unsigned int j;
 + unsigned int j, first_cpu = cpumask_first(policy-related_cpus);
   int ret = 0;
 
 - for_each_cpu(j, policy-cpus) {
 + for_each_cpu(j, policy-related_cpus) {
   struct device *cpu_dev;
 
 - if (j == policy-cpu)
 + if (j == first_cpu)
   continue;
 
   pr_debug(Adding link for CPU: %u\n, j);
 @@ -881,12 +881,16 @@ static int cpufreq_add_dev_symlink(struct 
 cpufreq_policy *policy)
   return ret;
  }
 
 -static int cpufreq_add_dev_interface(struct cpufreq_policy *policy,
 -  struct device *dev)
 +static int cpufreq_add_dev_interface(struct cpufreq_policy *policy

Re: [PATCH] cpufreq: report driver's successful {un}registration

2014-07-10 Thread Srivatsa S. Bhat
On 06/26/2014 02:21 PM, Viresh Kumar wrote:
> We do report driver's successful {un}registration from cpufreq core, but is 
> done
> with pr_debug() and so this doesn't appear in boot logs.
> 
> Convert this to pr_info() to make it visible in logs.
> 

While at it, let's also standardize those messages, since they will be
more visible now.

> Signed-off-by: Viresh Kumar 
> ---
>  drivers/cpufreq/cpufreq.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 62259d2..63d8f8f 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -2468,7 +2468,7 @@ int cpufreq_register_driver(struct cpufreq_driver 
> *driver_data)
>   }
> 
>   register_hotcpu_notifier(_cpu_notifier);
> - pr_debug("driver %s up and running\n", driver_data->name);
> + pr_info("driver %s up and running\n", driver_data->name);

How about "Registered cpufreq driver: %s\n"

> 
>   return 0;
>  err_if_unreg:
> @@ -2499,7 +2499,7 @@ int cpufreq_unregister_driver(struct cpufreq_driver 
> *driver)
>   if (!cpufreq_driver || (driver != cpufreq_driver))
>   return -EINVAL;
> 
> - pr_debug("unregistering driver %s\n", driver->name);
> + pr_info("unregistering driver %s\n", driver->name);

And "Unregistered cpufreq driver: %s\n"

(Also, its probably a good idea to have 2 prints here, just like we have for
cpufreq_register_driver() - one with pr_debug() at the beginning of the
function which tells us that we are *about* to register the driver, and then
a second print with pr_info() at the end of the function that tells us that
we successfully registered the driver. We can do the same thing for
unregistration as well.)

> 
>   subsys_interface_unregister(_interface);
>   if (cpufreq_boost_supported())
> 

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cpufreq: report driver's successful {un}registration

2014-07-10 Thread Srivatsa S. Bhat
On 06/26/2014 02:21 PM, Viresh Kumar wrote:
 We do report driver's successful {un}registration from cpufreq core, but is 
 done
 with pr_debug() and so this doesn't appear in boot logs.
 
 Convert this to pr_info() to make it visible in logs.
 

While at it, let's also standardize those messages, since they will be
more visible now.

 Signed-off-by: Viresh Kumar viresh.ku...@linaro.org
 ---
  drivers/cpufreq/cpufreq.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
 index 62259d2..63d8f8f 100644
 --- a/drivers/cpufreq/cpufreq.c
 +++ b/drivers/cpufreq/cpufreq.c
 @@ -2468,7 +2468,7 @@ int cpufreq_register_driver(struct cpufreq_driver 
 *driver_data)
   }
 
   register_hotcpu_notifier(cpufreq_cpu_notifier);
 - pr_debug(driver %s up and running\n, driver_data-name);
 + pr_info(driver %s up and running\n, driver_data-name);

How about Registered cpufreq driver: %s\n

 
   return 0;
  err_if_unreg:
 @@ -2499,7 +2499,7 @@ int cpufreq_unregister_driver(struct cpufreq_driver 
 *driver)
   if (!cpufreq_driver || (driver != cpufreq_driver))
   return -EINVAL;
 
 - pr_debug(unregistering driver %s\n, driver-name);
 + pr_info(unregistering driver %s\n, driver-name);

And Unregistered cpufreq driver: %s\n

(Also, its probably a good idea to have 2 prints here, just like we have for
cpufreq_register_driver() - one with pr_debug() at the beginning of the
function which tells us that we are *about* to register the driver, and then
a second print with pr_info() at the end of the function that tells us that
we successfully registered the driver. We can do the same thing for
unregistration as well.)

 
   subsys_interface_unregister(cpufreq_interface);
   if (cpufreq_boost_supported())
 

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/2] CPU hotplug, smp: Flush any pending IPI callbacks before CPU offline

2014-06-25 Thread Srivatsa S. Bhat
On 06/25/2014 09:12 PM, Sasha Levin wrote:
> On 05/26/2014 07:08 AM, Srivatsa S. Bhat wrote:
>> During CPU offline, in stop-machine, we don't enforce any rule in the
>> _DISABLE_IRQ stage, regarding the order in which the outgoing CPU and the 
>> other
>> CPUs disable their local interrupts. Hence, we can encounter a scenario as
>> depicted below, in which IPIs are sent by the other CPUs to the CPU going
>> offline (while it is *still* online), but the outgoing CPU notices them only
>> *after* it has gone offline.
>>
[...]
> Hi all,
> 
> While fuzzing with trinity inside a KVM tools guest running the latest -next
> kernel I've stumbled on the following spew:
> 

Thanks for the bug report. Please test if this patch fixes the problem
for you:

https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git/commit/?h=timers/nohz=921d8b81281ecdca686369f52165d04fa3505bd7

Regards,
Srivatsa S. Bhat

> [ 1982.600053] kernel BUG at kernel/irq_work.c:175!
> [ 1982.600053] invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
> [ 1982.600053] Dumping ftrace buffer:
> [ 1982.600053](ftrace buffer empty)
> [ 1982.600053] Modules linked in:
> [ 1982.600053] CPU: 14 PID: 168 Comm: migration/14 Not tainted 
> 3.16.0-rc2-next-20140624-sasha-00024-g332b58d #726
> [ 1982.600053] task: 88036a5a3000 ti: 88036a5ac000 task.ti: 
> 88036a5ac000
> [ 1982.600053] RIP: irq_work_run (kernel/irq_work.c:175 (discriminator 1))
> [ 1982.600053] RSP: :88036a5afbe0  EFLAGS: 00010046
> [ 1982.600053] RAX: 8001 RBX:  RCX: 
> 0008
> [ 1982.600053] RDX: 000e RSI: af9185fb RDI: 
> 
> [ 1982.600053] RBP: 88036a5afc08 R08: 00099224 R09: 
> 
> [ 1982.600053] R10:  R11: 0001 R12: 
> 88036afd8400
> [ 1982.600053] R13:  R14: b0cf8120 R15: 
> b0cce5d0
> [ 1982.600053] FS:  () GS:88036ae0() 
> knlGS:
> [ 1982.600053] CS:  0010 DS:  ES:  CR0: 8005003b
> [ 1982.600053] CR2: 019485d0 CR3: 0002c7c8f000 CR4: 
> 06a0
> [ 1982.600053] Stack:
> [ 1982.600053]  ab20fbb5 0082 88036afd8440 
> 
> [ 1982.600053]  0001 88036a5afc28 ab20fca7 
> 
> [ 1982.600053]  ffef 88036a5afc78 ab19c58e 
> 000e
> [ 1982.600053] Call Trace:
> [ 1982.600053] ? flush_smp_call_function_queue (kernel/smp.c:263)
> [ 1982.600053] hotplug_cfd (kernel/smp.c:81)
> [ 1982.600053] notifier_call_chain (kernel/notifier.c:95)
> [ 1982.600053] __raw_notifier_call_chain (kernel/notifier.c:395)
> [ 1982.600053] __cpu_notify (kernel/cpu.c:202)
> [ 1982.600053] cpu_notify (kernel/cpu.c:211)
> [ 1982.600053] take_cpu_down (./arch/x86/include/asm/current.h:14 
> kernel/cpu.c:312)
> [ 1982.600053] multi_cpu_stop (kernel/stop_machine.c:201)
> [ 1982.600053] ? __stop_cpus (kernel/stop_machine.c:170)
> [ 1982.600053] cpu_stopper_thread (kernel/stop_machine.c:474)
> [ 1982.600053] ? put_lock_stats.isra.12 (./arch/x86/include/asm/preempt.h:98 
> kernel/locking/lockdep.c:254)
> [ 1982.600053] ? _raw_spin_unlock_irqrestore 
> (./arch/x86/include/asm/paravirt.h:809 include/linux/spinlock_api_smp.h:160 
> kernel/locking/spinlock.c:191)
> [ 1982.600053] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
> [ 1982.600053] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2557 
> kernel/locking/lockdep.c:2599)
> [ 1982.600053] smpboot_thread_fn (kernel/smpboot.c:160)
> [ 1982.600053] ? __smpboot_create_thread (kernel/smpboot.c:105)
> [ 1982.600053] kthread (kernel/kthread.c:210)
> [ 1982.600053] ? wait_for_completion (kernel/sched/completion.c:77 
> kernel/sched/completion.c:93 kernel/sched/completion.c:101 
> kernel/sched/completion.c:122)
> [ 1982.600053] ? kthread_create_on_node (kernel/kthread.c:176)
> [ 1982.600053] ret_from_fork (arch/x86/kernel/entry_64.S:349)
> [ 1982.600053] ? kthread_create_on_node (kernel/kthread.c:176)
> [ 1982.600053] Code: 00 00 00 00 e8 63 ff ff ff 48 83 c4 08 b8 01 00 00 00 5b 
> 5d c3 b8 01 00 00 00 c3 90 65 8b 04 25 a0 da 00 00 a9 00 00 0f 00 75 09 <0f> 
> 0b 0f 1f 80 00 00 00 00 55 48 89 e5 e8 2f ff ff ff 5d c3 66
> All code
> 
>0: 00 00   add%al,(%rax)
>2: 00 00   add%al,(%rax)
>4: e8 63 ff ff ff  callq  0xff6c
>9: 48 83 c4 08 add$0x8,%rsp
>d: b8 01 00 00 00  mov$0x1,%eax
>   12: 5b  pop%rbx
>   13: 5d  pop%rbp
>   14: c3  

Re: [PATCH 2/6] irq_work: Implement remote queueing

2014-06-25 Thread Srivatsa S. Bhat
On 06/25/2014 10:08 PM, Peter Zijlstra wrote:
> On Wed, Jun 25, 2014 at 10:23:21AM -0600, Stephen Warren wrote:
>> On 06/25/2014 04:19 AM, Peter Zijlstra wrote:
>>> On Wed, Jun 25, 2014 at 03:24:11PM +0530, Srivatsa S. Bhat wrote:
>>>> Wait, that was a stupid idea. hotplug_cfd() already invokes irq_work_run
>>>> indirectly via flush_smp_call_function_queue(). So irq_work_cpu_notify()
>>>> doesn't need to invoke it again, AFAIU. So perhaps we can get rid of
>>>> irq_work_cpu_notify() altogether?
>>>
>>> Just so...
>>>
>>> getting up at 6am and sitting in an airport terminal doesn't seem to
>>> agree with me; any more silly fail here?
>>>
>>> ---
>>> Subject: irq_work: Remove BUG_ON in irq_work_run()
>>> From: Peter Zijlstra 
>>> Date: Wed Jun 25 07:13:07 CEST 2014
>>>
>>> Because of a collision with 8d056c48e486 ("CPU hotplug, smp: flush any
>>> pending IPI callbacks before CPU offline"), which ends up calling
>>> hotplug_cfd()->flush_smp_call_function_queue()->irq_work_run(), which
>>> is not from IRQ context.
>>>
>>> And since that already calls irq_work_run() from the hotplug path,
>>> remove our entire hotplug handling.
>>
>> Tested-by: Stephen Warren 
>>
>> [with the s/static// already mentioned in this thread, obviously:-)]
> 
> Right; I pushed out a fixed version right before loosing my tubes at the
> airport :-)
> 
> https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git/commit/?h=timers/nohz=921d8b81281ecdca686369f52165d04fa3505bd7
> 

This version looks good.

Reviewed-by: Srivatsa S. Bhat 

Regards,
Srivatsa S. Bhat


> I've not gotten wu build bot spam on it so it must be good ;-)
> 
> In any case, I'll add your tested-by and update later this evening.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/6] irq_work: Implement remote queueing

2014-06-25 Thread Srivatsa S. Bhat
On 06/25/2014 03:49 PM, Peter Zijlstra wrote:
> On Wed, Jun 25, 2014 at 03:24:11PM +0530, Srivatsa S. Bhat wrote:
>> Wait, that was a stupid idea. hotplug_cfd() already invokes irq_work_run
>> indirectly via flush_smp_call_function_queue(). So irq_work_cpu_notify()
>> doesn't need to invoke it again, AFAIU. So perhaps we can get rid of
>> irq_work_cpu_notify() altogether?
> 
> Just so...
> 
> getting up at 6am and sitting in an airport terminal doesn't seem to
> agree with me;

Haha :-)

> any more silly fail here?
> 

A few minor nits below..

> ---
> Subject: irq_work: Remove BUG_ON in irq_work_run()
> From: Peter Zijlstra 
> Date: Wed Jun 25 07:13:07 CEST 2014
> 
> Because of a collision with 8d056c48e486 ("CPU hotplug, smp: flush any
> pending IPI callbacks before CPU offline"), which ends up calling
> hotplug_cfd()->flush_smp_call_function_queue()->irq_work_run(), which
> is not from IRQ context.
> 
> And since that already calls irq_work_run() from the hotplug path,
> remove our entire hotplug handling.
> 
> Cc: Frederic Weisbecker 
> Reported-by: Stephen Warren 
> Signed-off-by: Peter Zijlstra 
> Link: http://lkml.kernel.org/n/tip-busatzs2gvz4v62258agi...@git.kernel.org
> ---
>  kernel/irq_work.c |   48 +---
>  1 file changed, 5 insertions(+), 43 deletions(-)
> 
> Index: linux-2.6/kernel/irq_work.c
> ===
> --- linux-2.6.orig/kernel/irq_work.c
> +++ linux-2.6/kernel/irq_work.c
> @@ -160,20 +160,14 @@ static void irq_work_run_list(struct lli
>   }
>  }
> 
> -static void __irq_work_run(void)
> -{
> - irq_work_run_list(&__get_cpu_var(raised_list));
> - irq_work_run_list(&__get_cpu_var(lazy_list));
> -}
> -
>  /*
> - * Run the irq_work entries on this cpu. Requires to be ran from hardirq
> - * context with local IRQs disabled.
> + * hotplug calls this through:
> + *  hotplug_cfs() -> flush_smp_call_function_queue()

s/hotplug_cfs/hotplug_cfd

>   */
> -void irq_work_run(void)
> +static void irq_work_run(void)

s/static//

>  {
> - BUG_ON(!in_irq());
> - __irq_work_run();
> +     irq_work_run_list(&__get_cpu_var(raised_list));
> + irq_work_run_list(&__get_cpu_var(lazy_list));
>  }
>  EXPORT_SYMBOL_GPL(irq_work_run);

With those 2 changes, everything looks good to me.

Regards,
Srivatsa S. Bhat



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/6] irq_work: Implement remote queueing

2014-06-25 Thread Srivatsa S. Bhat
On 06/25/2014 03:20 PM, Srivatsa S. Bhat wrote:
> On 06/25/2014 03:09 PM, Peter Zijlstra wrote:
>> On Wed, Jun 25, 2014 at 11:36:57AM +0200, Peter Zijlstra wrote:
>>> On Wed, Jun 25, 2014 at 12:07:05PM +0530, Srivatsa S. Bhat wrote:
>>>> I don't think irqs_disabled() is the problematic condition, since
>>>> hotplug_cfg() invokes irq_work_run() from CPU_DYING context (which has
>>>> irqs disabled). I guess you meant to remove the in_irq() check inside
>>>> irq_work_run() instead?
>>>
>>> Yes, clearly I should not get up at 6am.. :-) Let me go do a new one.
>>
>> ---
>> Subject: irq_work: Remove BUG_ON in irq_work_run()
>> From: Peter Zijlstra 
>> Date: Wed Jun 25 07:13:07 CEST 2014
>>
>> Because of a collision with 8d056c48e486 ("CPU hotplug, smp: flush any
>> pending IPI callbacks before CPU offline"), which ends up calling
>> hotplug_cfd()->flush_smp_call_function_queue()->irq_work_run(), which
>> is not from IRQ context.
>>
>> Cc: Frederic Weisbecker 
>> Reported-by: Stephen Warren 
>> Signed-off-by: Peter Zijlstra 
>> Link: http://lkml.kernel.org/n/tip-busatzs2gvz4v62258agi...@git.kernel.org
>> ---
>>  kernel/irq_work.c |   12 +---
>>  1 file changed, 1 insertion(+), 11 deletions(-)
>>
>> Index: linux-2.6/kernel/irq_work.c
>> ===
>> --- linux-2.6.orig/kernel/irq_work.c
>> +++ linux-2.6/kernel/irq_work.c
>> @@ -160,21 +160,11 @@ static void irq_work_run_list(struct lli
>>  }
>>  }
>>
>> -static void __irq_work_run(void)
> 
> Hmm, irq_work_cpu_notify() calls __irq_work_run() in the CPU_DYING
> phase, to by-pass BUG_ON(!in_irq()). How about doing the same thing
> from hotplug_cfd() as well?
> 

Wait, that was a stupid idea. hotplug_cfd() already invokes irq_work_run
indirectly via flush_smp_call_function_queue(). So irq_work_cpu_notify()
doesn't need to invoke it again, AFAIU. So perhaps we can get rid of
irq_work_cpu_notify() altogether?

Regards,
Srivatsa S. Bhat

>> +static void irq_work_run(void)
>>  {
>>  irq_work_run_list(&__get_cpu_var(raised_list));
>>  irq_work_run_list(&__get_cpu_var(lazy_list));
>>  }
>> -
>> -/*
>> - * Run the irq_work entries on this cpu. Requires to be ran from hardirq
>> - * context with local IRQs disabled.
>> - */
>> -void irq_work_run(void)
>> -{
>> -BUG_ON(!in_irq());
>> -__irq_work_run();
>> -}
>>  EXPORT_SYMBOL_GPL(irq_work_run);
>>
>>  /*
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/6] irq_work: Implement remote queueing

2014-06-25 Thread Srivatsa S. Bhat
On 06/25/2014 03:09 PM, Peter Zijlstra wrote:
> On Wed, Jun 25, 2014 at 11:36:57AM +0200, Peter Zijlstra wrote:
>> On Wed, Jun 25, 2014 at 12:07:05PM +0530, Srivatsa S. Bhat wrote:
>>> I don't think irqs_disabled() is the problematic condition, since
>>> hotplug_cfg() invokes irq_work_run() from CPU_DYING context (which has
>>> irqs disabled). I guess you meant to remove the in_irq() check inside
>>> irq_work_run() instead?
>>
>> Yes, clearly I should not get up at 6am.. :-) Let me go do a new one.
> 
> ---
> Subject: irq_work: Remove BUG_ON in irq_work_run()
> From: Peter Zijlstra 
> Date: Wed Jun 25 07:13:07 CEST 2014
> 
> Because of a collision with 8d056c48e486 ("CPU hotplug, smp: flush any
> pending IPI callbacks before CPU offline"), which ends up calling
> hotplug_cfd()->flush_smp_call_function_queue()->irq_work_run(), which
> is not from IRQ context.
> 
> Cc: Frederic Weisbecker 
> Reported-by: Stephen Warren 
> Signed-off-by: Peter Zijlstra 
> Link: http://lkml.kernel.org/n/tip-busatzs2gvz4v62258agi...@git.kernel.org
> ---
>  kernel/irq_work.c |   12 +---
>  1 file changed, 1 insertion(+), 11 deletions(-)
> 
> Index: linux-2.6/kernel/irq_work.c
> ===
> --- linux-2.6.orig/kernel/irq_work.c
> +++ linux-2.6/kernel/irq_work.c
> @@ -160,21 +160,11 @@ static void irq_work_run_list(struct lli
>   }
>  }
> 
> -static void __irq_work_run(void)

Hmm, irq_work_cpu_notify() calls __irq_work_run() in the CPU_DYING
phase, to by-pass BUG_ON(!in_irq()). How about doing the same thing
from hotplug_cfd() as well?

> +static void irq_work_run(void)
>  {
>   irq_work_run_list(&__get_cpu_var(raised_list));
>   irq_work_run_list(&__get_cpu_var(lazy_list));
>  }
> -
> -/*
> - * Run the irq_work entries on this cpu. Requires to be ran from hardirq
> - * context with local IRQs disabled.
> - */
> -void irq_work_run(void)
> -{
> - BUG_ON(!in_irq());
> - __irq_work_run();
> -}
>  EXPORT_SYMBOL_GPL(irq_work_run);
> 
>  /*
> 

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [migration] kernel BUG at kernel/irq_work.c:175!

2014-06-25 Thread Srivatsa S. Bhat
On 06/25/2014 03:01 PM, Fengguang Wu wrote:
> Greetings,
> 
> 0day kernel testing robot got the below dmesg and the first bad commit is
>

I think this is the same issue as the one reported by Stephen Warren
here:

https://lkml.org/lkml/2014/6/24/765

Peter Zijlstra is working on a fix for that.

Regards,
Srivatsa S. Bhat

 
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> commit 68c90b2c635f18ad51ae7440162f6c082ea1288d
> Merge: f08af6f ec11f8c
> Author: Stephen Rothwell 
> AuthorDate: Mon Jun 23 14:12:48 2014 +1000
> 
> Merge branch 'akpm-current/current'
> 
> +-++++---+
> | | f08af6fa87 | ec11f8c81f | 68c90b2c63 | 
> next-20140623 |
> +-++++---+
> | boot_successes  | 60 | 60 | 0  | 0  
>|
> | boot_failures   | 0  | 0  | 20 | 13 
>|
> | kernel_BUG_at_kernel/irq_work.c | 0  | 0  | 20 | 13 
>|
> | invalid_opcode  | 0  | 0  | 20 | 13 
>|
> | RIP:irq_work_run| 0  | 0  | 20 | 13 
>|
> | backtrace:smpboot_thread_fn | 0  | 0  | 20 | 13 
>|
> +-++++---+
> 
> [2.194744] EDD information not available.
> [2.195290] Unregister pv shared memory for cpu 0
> [2.206025] [ cut here ]
> [2.206025] kernel BUG at kernel/irq_work.c:175!
> [2.206025] invalid opcode:  [#1] SMP DEBUG_PAGEALLOC
> [2.206025] CPU: 0 PID: 9 Comm: migration/0 Not tainted 
> 3.16.0-rc2-02039-g68c90b2 #1
> [2.206025] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [2.206025] task: 88001219a7e0 ti: 8800121a4000 task.ti: 
> 8800121a4000
> [2.206025] RIP: 0010:[]  [] 
> irq_work_run+0xf/0x1c
> [2.206025] RSP: :8800121a7c48  EFLAGS: 00010046
> [2.206025] RAX: 8001 RBX:  RCX: 
> 0005
> [2.206025] RDX:  RSI: 0008 RDI: 
> 
> [2.206025] RBP: 8800121a7c68 R08: 0002 R09: 
> 0001
> [2.206025] R10: 810e2a10 R11: 810b9de3 R12: 
> 880012412340
> [2.206025] R13:  R14:  R15: 
> 81c83e50
> [2.206025] FS:  () GS:88001240() 
> knlGS:
> [2.206025] CS:  0010 DS:  ES:  CR0: 8005003b
> [2.206025] CR2:  CR3: 01c0c000 CR4: 
> 06b0
> [2.206025] Stack:
> [2.206025]  810e87e0 880012412380 fff0 
> 81c81ba0
> [2.206025]  8800121a7c88 810e88f0 0001 
> fff0
> [2.206025]  8800121a7cd0 810b6e23  
> 0008
> [2.206025] Call Trace:
> [2.206025]  [] ? 
> flush_smp_call_function_queue+0xa4/0x107
> [2.206025]  [] hotplug_cfd+0xad/0xbb
> [2.206025]  [] notifier_call_chain+0x68/0x8e
> [2.206025]  [] __raw_notifier_call_chain+0x9/0xb
> [2.206025]  [] __cpu_notify+0x1b/0x32
> [2.206025]  [] cpu_notify+0xe/0x10
> [2.206025]  [] take_cpu_down+0x22/0x35
> [2.206025]  [] multi_cpu_stop+0x8c/0xe2
> [2.206025]  [] ? cpu_stopper_thread+0x126/0x126
> [2.206025]  [] cpu_stopper_thread+0x8d/0x126
> [2.206025]  [] ? lock_acquire+0x94/0x9d
> [2.206025]  [] ? _raw_spin_unlock_irqrestore+0x40/0x55
> [2.206025]  [] ? trace_hardirqs_on_caller+0x171/0x18d
> [2.206025]  [] ? _raw_spin_unlock_irqrestore+0x48/0x55
> [2.206025]  [] smpboot_thread_fn+0x182/0x1a0
> [2.206025]  [] ? in_egroup_p+0x2e/0x2e
> [2.206025]  [] kthread+0xcd/0xd5
> [2.206025]  [] ? __kthread_parkme+0x5c/0x5c
> [2.206025]  [] ret_from_fork+0x7c/0xb0
> [2.206025]  [] ? __kthread_parkme+0x5c/0x5c
> [2.206025] Code: 48 c7 c7 65 cd b0 81 e8 43 20 fa ff c6 05 50 e1 c9 00 01 
> eb 02 31 db 88 d8 5b 5d c3 65 8b 04 25 10 b8 00 00 a9 00 00 0f 00 75 02 <0f> 
> 0b 55 48 89 e5 e8 b5 fd ff ff 5d c3 55 48 89 e5 53 48 89 fb 
> [2.206025] RIP  [] irq_work_run+0xf/0x1c
> [2.206025]  RSP 
> [2.206025] ---[ end trace f7f1564c3a1f35d0 ]---
> [2.206025] note: migration/0[9] exited with preempt_count 1
> 
> git bisect start 58ae500a03a6bf68eee323c342431bfdd3f460b6 
> f08af6fa87ea33

Re: [PATCH 2/6] irq_work: Implement remote queueing

2014-06-25 Thread Srivatsa S. Bhat
On 06/25/2014 10:47 AM, Peter Zijlstra wrote:
> On Wed, Jun 25, 2014 at 07:12:34AM +0200, Peter Zijlstra wrote:
>> On Tue, Jun 24, 2014 at 02:33:41PM -0600, Stephen Warren wrote:
>>> On 06/10/2014 09:15 AM, Frederic Weisbecker wrote:
>>>> irq work currently only supports local callbacks. However its code
>>>> is mostly ready to run remote callbacks and we have some potential user.

[...]

>> Right you are.. I think I'll just remove the BUG_ON(), Frederic?
> 
> Something a little so like:
> 
> ---
> Subject: irq_work: Remove BUG_ON in irq_work_run_list()

I think this should be irq_work_run(), see below...

> From: Peter Zijlstra 
> Date: Wed Jun 25 07:13:07 CEST 2014
> 
> Because of a collision with 8d056c48e486 ("CPU hotplug, smp: flush any
> pending IPI callbacks before CPU offline"), which ends up calling
> hotplug_cfd()->flush_smp_call_function_queue()->run_irq_work(), which


s/run_irq_work/irq_work_run


> is not from IRQ context.
> 
> Cc: Frederic Weisbecker 
> Reported-by: Stephen Warren 
> Signed-off-by: Peter Zijlstra 
> ---
>  kernel/irq_work.c |2 --
>  1 file changed, 2 deletions(-)
> 
> --- a/kernel/irq_work.c
> +++ b/kernel/irq_work.c
> @@ -130,8 +130,6 @@ static void irq_work_run_list(struct lli
>   struct irq_work *work;
>   struct llist_node *llnode;
>  
> - BUG_ON(!irqs_disabled());
> -

I don't think irqs_disabled() is the problematic condition, since
hotplug_cfg() invokes irq_work_run() from CPU_DYING context (which has
irqs disabled). I guess you meant to remove the in_irq() check inside
irq_work_run() instead?

Regards,
Srivatsa S. Bhat

>   if (llist_empty(list))
>   return;
>  
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/6] irq_work: Implement remote queueing

2014-06-25 Thread Srivatsa S. Bhat
On 06/25/2014 10:47 AM, Peter Zijlstra wrote:
 On Wed, Jun 25, 2014 at 07:12:34AM +0200, Peter Zijlstra wrote:
 On Tue, Jun 24, 2014 at 02:33:41PM -0600, Stephen Warren wrote:
 On 06/10/2014 09:15 AM, Frederic Weisbecker wrote:
 irq work currently only supports local callbacks. However its code
 is mostly ready to run remote callbacks and we have some potential user.

[...]

 Right you are.. I think I'll just remove the BUG_ON(), Frederic?
 
 Something a little so like:
 
 ---
 Subject: irq_work: Remove BUG_ON in irq_work_run_list()

I think this should be irq_work_run(), see below...

 From: Peter Zijlstra pet...@infradead.org
 Date: Wed Jun 25 07:13:07 CEST 2014
 
 Because of a collision with 8d056c48e486 (CPU hotplug, smp: flush any
 pending IPI callbacks before CPU offline), which ends up calling
 hotplug_cfd()-flush_smp_call_function_queue()-run_irq_work(), which


s/run_irq_work/irq_work_run


 is not from IRQ context.
 
 Cc: Frederic Weisbecker fweis...@gmail.com
 Reported-by: Stephen Warren swar...@wwwdotorg.org
 Signed-off-by: Peter Zijlstra pet...@infradead.org
 ---
  kernel/irq_work.c |2 --
  1 file changed, 2 deletions(-)
 
 --- a/kernel/irq_work.c
 +++ b/kernel/irq_work.c
 @@ -130,8 +130,6 @@ static void irq_work_run_list(struct lli
   struct irq_work *work;
   struct llist_node *llnode;
  
 - BUG_ON(!irqs_disabled());
 -

I don't think irqs_disabled() is the problematic condition, since
hotplug_cfg() invokes irq_work_run() from CPU_DYING context (which has
irqs disabled). I guess you meant to remove the in_irq() check inside
irq_work_run() instead?

Regards,
Srivatsa S. Bhat

   if (llist_empty(list))
   return;
  
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [migration] kernel BUG at kernel/irq_work.c:175!

2014-06-25 Thread Srivatsa S. Bhat
On 06/25/2014 03:01 PM, Fengguang Wu wrote:
 Greetings,
 
 0day kernel testing robot got the below dmesg and the first bad commit is


I think this is the same issue as the one reported by Stephen Warren
here:

https://lkml.org/lkml/2014/6/24/765

Peter Zijlstra is working on a fix for that.

Regards,
Srivatsa S. Bhat

 
 git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
 commit 68c90b2c635f18ad51ae7440162f6c082ea1288d
 Merge: f08af6f ec11f8c
 Author: Stephen Rothwell s...@canb.auug.org.au
 AuthorDate: Mon Jun 23 14:12:48 2014 +1000
 
 Merge branch 'akpm-current/current'
 
 +-++++---+
 | | f08af6fa87 | ec11f8c81f | 68c90b2c63 | 
 next-20140623 |
 +-++++---+
 | boot_successes  | 60 | 60 | 0  | 0  
|
 | boot_failures   | 0  | 0  | 20 | 13 
|
 | kernel_BUG_at_kernel/irq_work.c | 0  | 0  | 20 | 13 
|
 | invalid_opcode  | 0  | 0  | 20 | 13 
|
 | RIP:irq_work_run| 0  | 0  | 20 | 13 
|
 | backtrace:smpboot_thread_fn | 0  | 0  | 20 | 13 
|
 +-++++---+
 
 [2.194744] EDD information not available.
 [2.195290] Unregister pv shared memory for cpu 0
 [2.206025] [ cut here ]
 [2.206025] kernel BUG at kernel/irq_work.c:175!
 [2.206025] invalid opcode:  [#1] SMP DEBUG_PAGEALLOC
 [2.206025] CPU: 0 PID: 9 Comm: migration/0 Not tainted 
 3.16.0-rc2-02039-g68c90b2 #1
 [2.206025] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 [2.206025] task: 88001219a7e0 ti: 8800121a4000 task.ti: 
 8800121a4000
 [2.206025] RIP: 0010:[810f9318]  [810f9318] 
 irq_work_run+0xf/0x1c
 [2.206025] RSP: :8800121a7c48  EFLAGS: 00010046
 [2.206025] RAX: 8001 RBX:  RCX: 
 0005
 [2.206025] RDX:  RSI: 0008 RDI: 
 
 [2.206025] RBP: 8800121a7c68 R08: 0002 R09: 
 0001
 [2.206025] R10: 810e2a10 R11: 810b9de3 R12: 
 880012412340
 [2.206025] R13:  R14:  R15: 
 81c83e50
 [2.206025] FS:  () GS:88001240() 
 knlGS:
 [2.206025] CS:  0010 DS:  ES:  CR0: 8005003b
 [2.206025] CR2:  CR3: 01c0c000 CR4: 
 06b0
 [2.206025] Stack:
 [2.206025]  810e87e0 880012412380 fff0 
 81c81ba0
 [2.206025]  8800121a7c88 810e88f0 0001 
 fff0
 [2.206025]  8800121a7cd0 810b6e23  
 0008
 [2.206025] Call Trace:
 [2.206025]  [810e87e0] ? 
 flush_smp_call_function_queue+0xa4/0x107
 [2.206025]  [810e88f0] hotplug_cfd+0xad/0xbb
 [2.206025]  [810b6e23] notifier_call_chain+0x68/0x8e
 [2.206025]  [810b70c0] __raw_notifier_call_chain+0x9/0xb
 [2.206025]  [8109b39e] __cpu_notify+0x1b/0x32
 [2.206025]  [8109b3c3] cpu_notify+0xe/0x10
 [2.206025]  [817e2817] take_cpu_down+0x22/0x35
 [2.206025]  [810f4153] multi_cpu_stop+0x8c/0xe2
 [2.206025]  [810f40c7] ? cpu_stopper_thread+0x126/0x126
 [2.206025]  [810f402e] cpu_stopper_thread+0x8d/0x126
 [2.206025]  [810cdab4] ? lock_acquire+0x94/0x9d
 [2.206025]  [817f25af] ? _raw_spin_unlock_irqrestore+0x40/0x55
 [2.206025]  [810cbdcd] ? trace_hardirqs_on_caller+0x171/0x18d
 [2.206025]  [817f25b7] ? _raw_spin_unlock_irqrestore+0x48/0x55
 [2.206025]  [810b8e39] smpboot_thread_fn+0x182/0x1a0
 [2.206025]  [810b8cb7] ? in_egroup_p+0x2e/0x2e
 [2.206025]  [810b372c] kthread+0xcd/0xd5
 [2.206025]  [810b365f] ? __kthread_parkme+0x5c/0x5c
 [2.206025]  [817f2f3c] ret_from_fork+0x7c/0xb0
 [2.206025]  [810b365f] ? __kthread_parkme+0x5c/0x5c
 [2.206025] Code: 48 c7 c7 65 cd b0 81 e8 43 20 fa ff c6 05 50 e1 c9 00 01 
 eb 02 31 db 88 d8 5b 5d c3 65 8b 04 25 10 b8 00 00 a9 00 00 0f 00 75 02 0f 
 0b 55 48 89 e5 e8 b5 fd ff ff 5d c3 55 48 89 e5 53 48 89 fb 
 [2.206025] RIP  [810f9318] irq_work_run+0xf/0x1c
 [2.206025]  RSP 8800121a7c48
 [2.206025] ---[ end trace f7f1564c3a1f35d0 ]---
 [2.206025] note: migration/0[9] exited with preempt_count 1
 
 git bisect start 58ae500a03a6bf68eee323c342431bfdd3f460b6

Re: [PATCH 2/6] irq_work: Implement remote queueing

2014-06-25 Thread Srivatsa S. Bhat
On 06/25/2014 03:09 PM, Peter Zijlstra wrote:
 On Wed, Jun 25, 2014 at 11:36:57AM +0200, Peter Zijlstra wrote:
 On Wed, Jun 25, 2014 at 12:07:05PM +0530, Srivatsa S. Bhat wrote:
 I don't think irqs_disabled() is the problematic condition, since
 hotplug_cfg() invokes irq_work_run() from CPU_DYING context (which has
 irqs disabled). I guess you meant to remove the in_irq() check inside
 irq_work_run() instead?

 Yes, clearly I should not get up at 6am.. :-) Let me go do a new one.
 
 ---
 Subject: irq_work: Remove BUG_ON in irq_work_run()
 From: Peter Zijlstra pet...@infradead.org
 Date: Wed Jun 25 07:13:07 CEST 2014
 
 Because of a collision with 8d056c48e486 (CPU hotplug, smp: flush any
 pending IPI callbacks before CPU offline), which ends up calling
 hotplug_cfd()-flush_smp_call_function_queue()-irq_work_run(), which
 is not from IRQ context.
 
 Cc: Frederic Weisbecker fweis...@gmail.com
 Reported-by: Stephen Warren swar...@wwwdotorg.org
 Signed-off-by: Peter Zijlstra pet...@infradead.org
 Link: http://lkml.kernel.org/n/tip-busatzs2gvz4v62258agi...@git.kernel.org
 ---
  kernel/irq_work.c |   12 +---
  1 file changed, 1 insertion(+), 11 deletions(-)
 
 Index: linux-2.6/kernel/irq_work.c
 ===
 --- linux-2.6.orig/kernel/irq_work.c
 +++ linux-2.6/kernel/irq_work.c
 @@ -160,21 +160,11 @@ static void irq_work_run_list(struct lli
   }
  }
 
 -static void __irq_work_run(void)

Hmm, irq_work_cpu_notify() calls __irq_work_run() in the CPU_DYING
phase, to by-pass BUG_ON(!in_irq()). How about doing the same thing
from hotplug_cfd() as well?

 +static void irq_work_run(void)
  {
   irq_work_run_list(__get_cpu_var(raised_list));
   irq_work_run_list(__get_cpu_var(lazy_list));
  }
 -
 -/*
 - * Run the irq_work entries on this cpu. Requires to be ran from hardirq
 - * context with local IRQs disabled.
 - */
 -void irq_work_run(void)
 -{
 - BUG_ON(!in_irq());
 - __irq_work_run();
 -}
  EXPORT_SYMBOL_GPL(irq_work_run);
 
  /*
 

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/6] irq_work: Implement remote queueing

2014-06-25 Thread Srivatsa S. Bhat
On 06/25/2014 03:20 PM, Srivatsa S. Bhat wrote:
 On 06/25/2014 03:09 PM, Peter Zijlstra wrote:
 On Wed, Jun 25, 2014 at 11:36:57AM +0200, Peter Zijlstra wrote:
 On Wed, Jun 25, 2014 at 12:07:05PM +0530, Srivatsa S. Bhat wrote:
 I don't think irqs_disabled() is the problematic condition, since
 hotplug_cfg() invokes irq_work_run() from CPU_DYING context (which has
 irqs disabled). I guess you meant to remove the in_irq() check inside
 irq_work_run() instead?

 Yes, clearly I should not get up at 6am.. :-) Let me go do a new one.

 ---
 Subject: irq_work: Remove BUG_ON in irq_work_run()
 From: Peter Zijlstra pet...@infradead.org
 Date: Wed Jun 25 07:13:07 CEST 2014

 Because of a collision with 8d056c48e486 (CPU hotplug, smp: flush any
 pending IPI callbacks before CPU offline), which ends up calling
 hotplug_cfd()-flush_smp_call_function_queue()-irq_work_run(), which
 is not from IRQ context.

 Cc: Frederic Weisbecker fweis...@gmail.com
 Reported-by: Stephen Warren swar...@wwwdotorg.org
 Signed-off-by: Peter Zijlstra pet...@infradead.org
 Link: http://lkml.kernel.org/n/tip-busatzs2gvz4v62258agi...@git.kernel.org
 ---
  kernel/irq_work.c |   12 +---
  1 file changed, 1 insertion(+), 11 deletions(-)

 Index: linux-2.6/kernel/irq_work.c
 ===
 --- linux-2.6.orig/kernel/irq_work.c
 +++ linux-2.6/kernel/irq_work.c
 @@ -160,21 +160,11 @@ static void irq_work_run_list(struct lli
  }
  }

 -static void __irq_work_run(void)
 
 Hmm, irq_work_cpu_notify() calls __irq_work_run() in the CPU_DYING
 phase, to by-pass BUG_ON(!in_irq()). How about doing the same thing
 from hotplug_cfd() as well?
 

Wait, that was a stupid idea. hotplug_cfd() already invokes irq_work_run
indirectly via flush_smp_call_function_queue(). So irq_work_cpu_notify()
doesn't need to invoke it again, AFAIU. So perhaps we can get rid of
irq_work_cpu_notify() altogether?

Regards,
Srivatsa S. Bhat

 +static void irq_work_run(void)
  {
  irq_work_run_list(__get_cpu_var(raised_list));
  irq_work_run_list(__get_cpu_var(lazy_list));
  }
 -
 -/*
 - * Run the irq_work entries on this cpu. Requires to be ran from hardirq
 - * context with local IRQs disabled.
 - */
 -void irq_work_run(void)
 -{
 -BUG_ON(!in_irq());
 -__irq_work_run();
 -}
  EXPORT_SYMBOL_GPL(irq_work_run);

  /*

 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/6] irq_work: Implement remote queueing

2014-06-25 Thread Srivatsa S. Bhat
On 06/25/2014 03:49 PM, Peter Zijlstra wrote:
 On Wed, Jun 25, 2014 at 03:24:11PM +0530, Srivatsa S. Bhat wrote:
 Wait, that was a stupid idea. hotplug_cfd() already invokes irq_work_run
 indirectly via flush_smp_call_function_queue(). So irq_work_cpu_notify()
 doesn't need to invoke it again, AFAIU. So perhaps we can get rid of
 irq_work_cpu_notify() altogether?
 
 Just so...
 
 getting up at 6am and sitting in an airport terminal doesn't seem to
 agree with me;

Haha :-)

 any more silly fail here?
 

A few minor nits below..

 ---
 Subject: irq_work: Remove BUG_ON in irq_work_run()
 From: Peter Zijlstra pet...@infradead.org
 Date: Wed Jun 25 07:13:07 CEST 2014
 
 Because of a collision with 8d056c48e486 (CPU hotplug, smp: flush any
 pending IPI callbacks before CPU offline), which ends up calling
 hotplug_cfd()-flush_smp_call_function_queue()-irq_work_run(), which
 is not from IRQ context.
 
 And since that already calls irq_work_run() from the hotplug path,
 remove our entire hotplug handling.
 
 Cc: Frederic Weisbecker fweis...@gmail.com
 Reported-by: Stephen Warren swar...@wwwdotorg.org
 Signed-off-by: Peter Zijlstra pet...@infradead.org
 Link: http://lkml.kernel.org/n/tip-busatzs2gvz4v62258agi...@git.kernel.org
 ---
  kernel/irq_work.c |   48 +---
  1 file changed, 5 insertions(+), 43 deletions(-)
 
 Index: linux-2.6/kernel/irq_work.c
 ===
 --- linux-2.6.orig/kernel/irq_work.c
 +++ linux-2.6/kernel/irq_work.c
 @@ -160,20 +160,14 @@ static void irq_work_run_list(struct lli
   }
  }
 
 -static void __irq_work_run(void)
 -{
 - irq_work_run_list(__get_cpu_var(raised_list));
 - irq_work_run_list(__get_cpu_var(lazy_list));
 -}
 -
  /*
 - * Run the irq_work entries on this cpu. Requires to be ran from hardirq
 - * context with local IRQs disabled.
 + * hotplug calls this through:
 + *  hotplug_cfs() - flush_smp_call_function_queue()

s/hotplug_cfs/hotplug_cfd

   */
 -void irq_work_run(void)
 +static void irq_work_run(void)

s/static//

  {
 - BUG_ON(!in_irq());
 - __irq_work_run();
 + irq_work_run_list(__get_cpu_var(raised_list));
 + irq_work_run_list(__get_cpu_var(lazy_list));
  }
  EXPORT_SYMBOL_GPL(irq_work_run);

With those 2 changes, everything looks good to me.

Regards,
Srivatsa S. Bhat



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/6] irq_work: Implement remote queueing

2014-06-25 Thread Srivatsa S. Bhat
On 06/25/2014 10:08 PM, Peter Zijlstra wrote:
 On Wed, Jun 25, 2014 at 10:23:21AM -0600, Stephen Warren wrote:
 On 06/25/2014 04:19 AM, Peter Zijlstra wrote:
 On Wed, Jun 25, 2014 at 03:24:11PM +0530, Srivatsa S. Bhat wrote:
 Wait, that was a stupid idea. hotplug_cfd() already invokes irq_work_run
 indirectly via flush_smp_call_function_queue(). So irq_work_cpu_notify()
 doesn't need to invoke it again, AFAIU. So perhaps we can get rid of
 irq_work_cpu_notify() altogether?

 Just so...

 getting up at 6am and sitting in an airport terminal doesn't seem to
 agree with me; any more silly fail here?

 ---
 Subject: irq_work: Remove BUG_ON in irq_work_run()
 From: Peter Zijlstra pet...@infradead.org
 Date: Wed Jun 25 07:13:07 CEST 2014

 Because of a collision with 8d056c48e486 (CPU hotplug, smp: flush any
 pending IPI callbacks before CPU offline), which ends up calling
 hotplug_cfd()-flush_smp_call_function_queue()-irq_work_run(), which
 is not from IRQ context.

 And since that already calls irq_work_run() from the hotplug path,
 remove our entire hotplug handling.

 Tested-by: Stephen Warren swar...@nvidia.com

 [with the s/static// already mentioned in this thread, obviously:-)]
 
 Right; I pushed out a fixed version right before loosing my tubes at the
 airport :-)
 
 https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git/commit/?h=timers/nohzid=921d8b81281ecdca686369f52165d04fa3505bd7
 

This version looks good.

Reviewed-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com

Regards,
Srivatsa S. Bhat


 I've not gotten wu build bot spam on it so it must be good ;-)
 
 In any case, I'll add your tested-by and update later this evening.
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/2] CPU hotplug, smp: Flush any pending IPI callbacks before CPU offline

2014-06-25 Thread Srivatsa S. Bhat
On 06/25/2014 09:12 PM, Sasha Levin wrote:
 On 05/26/2014 07:08 AM, Srivatsa S. Bhat wrote:
 During CPU offline, in stop-machine, we don't enforce any rule in the
 _DISABLE_IRQ stage, regarding the order in which the outgoing CPU and the 
 other
 CPUs disable their local interrupts. Hence, we can encounter a scenario as
 depicted below, in which IPIs are sent by the other CPUs to the CPU going
 offline (while it is *still* online), but the outgoing CPU notices them only
 *after* it has gone offline.

[...]
 Hi all,
 
 While fuzzing with trinity inside a KVM tools guest running the latest -next
 kernel I've stumbled on the following spew:
 

Thanks for the bug report. Please test if this patch fixes the problem
for you:

https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git/commit/?h=timers/nohzid=921d8b81281ecdca686369f52165d04fa3505bd7

Regards,
Srivatsa S. Bhat

 [ 1982.600053] kernel BUG at kernel/irq_work.c:175!
 [ 1982.600053] invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
 [ 1982.600053] Dumping ftrace buffer:
 [ 1982.600053](ftrace buffer empty)
 [ 1982.600053] Modules linked in:
 [ 1982.600053] CPU: 14 PID: 168 Comm: migration/14 Not tainted 
 3.16.0-rc2-next-20140624-sasha-00024-g332b58d #726
 [ 1982.600053] task: 88036a5a3000 ti: 88036a5ac000 task.ti: 
 88036a5ac000
 [ 1982.600053] RIP: irq_work_run (kernel/irq_work.c:175 (discriminator 1))
 [ 1982.600053] RSP: :88036a5afbe0  EFLAGS: 00010046
 [ 1982.600053] RAX: 8001 RBX:  RCX: 
 0008
 [ 1982.600053] RDX: 000e RSI: af9185fb RDI: 
 
 [ 1982.600053] RBP: 88036a5afc08 R08: 00099224 R09: 
 
 [ 1982.600053] R10:  R11: 0001 R12: 
 88036afd8400
 [ 1982.600053] R13:  R14: b0cf8120 R15: 
 b0cce5d0
 [ 1982.600053] FS:  () GS:88036ae0() 
 knlGS:
 [ 1982.600053] CS:  0010 DS:  ES:  CR0: 8005003b
 [ 1982.600053] CR2: 019485d0 CR3: 0002c7c8f000 CR4: 
 06a0
 [ 1982.600053] Stack:
 [ 1982.600053]  ab20fbb5 0082 88036afd8440 
 
 [ 1982.600053]  0001 88036a5afc28 ab20fca7 
 
 [ 1982.600053]  ffef 88036a5afc78 ab19c58e 
 000e
 [ 1982.600053] Call Trace:
 [ 1982.600053] ? flush_smp_call_function_queue (kernel/smp.c:263)
 [ 1982.600053] hotplug_cfd (kernel/smp.c:81)
 [ 1982.600053] notifier_call_chain (kernel/notifier.c:95)
 [ 1982.600053] __raw_notifier_call_chain (kernel/notifier.c:395)
 [ 1982.600053] __cpu_notify (kernel/cpu.c:202)
 [ 1982.600053] cpu_notify (kernel/cpu.c:211)
 [ 1982.600053] take_cpu_down (./arch/x86/include/asm/current.h:14 
 kernel/cpu.c:312)
 [ 1982.600053] multi_cpu_stop (kernel/stop_machine.c:201)
 [ 1982.600053] ? __stop_cpus (kernel/stop_machine.c:170)
 [ 1982.600053] cpu_stopper_thread (kernel/stop_machine.c:474)
 [ 1982.600053] ? put_lock_stats.isra.12 (./arch/x86/include/asm/preempt.h:98 
 kernel/locking/lockdep.c:254)
 [ 1982.600053] ? _raw_spin_unlock_irqrestore 
 (./arch/x86/include/asm/paravirt.h:809 include/linux/spinlock_api_smp.h:160 
 kernel/locking/spinlock.c:191)
 [ 1982.600053] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
 [ 1982.600053] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2557 
 kernel/locking/lockdep.c:2599)
 [ 1982.600053] smpboot_thread_fn (kernel/smpboot.c:160)
 [ 1982.600053] ? __smpboot_create_thread (kernel/smpboot.c:105)
 [ 1982.600053] kthread (kernel/kthread.c:210)
 [ 1982.600053] ? wait_for_completion (kernel/sched/completion.c:77 
 kernel/sched/completion.c:93 kernel/sched/completion.c:101 
 kernel/sched/completion.c:122)
 [ 1982.600053] ? kthread_create_on_node (kernel/kthread.c:176)
 [ 1982.600053] ret_from_fork (arch/x86/kernel/entry_64.S:349)
 [ 1982.600053] ? kthread_create_on_node (kernel/kthread.c:176)
 [ 1982.600053] Code: 00 00 00 00 e8 63 ff ff ff 48 83 c4 08 b8 01 00 00 00 5b 
 5d c3 b8 01 00 00 00 c3 90 65 8b 04 25 a0 da 00 00 a9 00 00 0f 00 75 09 0f 
 0b 0f 1f 80 00 00 00 00 55 48 89 e5 e8 2f ff ff ff 5d c3 66
 All code
 
0: 00 00   add%al,(%rax)
2: 00 00   add%al,(%rax)
4: e8 63 ff ff ff  callq  0xff6c
9: 48 83 c4 08 add$0x8,%rsp
d: b8 01 00 00 00  mov$0x1,%eax
   12: 5b  pop%rbx
   13: 5d  pop%rbp
   14: c3  retq
   15: b8 01 00 00 00  mov$0x1,%eax
   1a: c3  retq
   1b: 90  nop
   1c: 65 8b 04 25 a0 da 00mov%gs:0xdaa0,%eax
   23: 00
   24: a9 00 00 0f 00  test   $0xf,%eax
   29: 75 09   jne0x34
   2b:*0f 0b   ud2 -- trapping instruction
   2d: 0f 1f 80 00 00

Re: Boot warnings on exynos5420 based boards

2014-06-17 Thread Srivatsa S. Bhat
On 06/17/2014 05:25 PM, Sachin Kamat wrote:
> Hi Srivatsa,
> 
> On Tue, Jun 17, 2014 at 3:24 PM, Srivatsa S. Bhat
>  wrote:
>> On 06/17/2014 03:03 PM, Sachin Kamat wrote:
>>
>>>> Below is an updated patch, please let me know how it goes. (You'll have to
>>>> revert c47a9d7cca first, and then 56e692182, before trying this patch).
>>>
>>> I am unable to apply your below patch on top of the above 2 reverts.
>>> Applying: CPU hotplug, smp: Execute any pending IPI callbacks before CPU 
>>> offline
>>> fatal: corrupt patch at line 106
>>> Patch failed at 0001 CPU hotplug, smp: Execute any pending IPI
>>> callbacks before CPU offline
>>>
>>> Even with 'patch' I get the below failures:
>>> patching file kernel/smp.c
>>> Hunk #2 FAILED at 53.
>>> Hunk #3 FAILED at 179.
>>> 2 out of 3 hunks FAILED -- saving rejects to file kernel/smp.c.rej
>>>
>>
>> Hmm, weird. My mailer must have screwed it up.
>>
>> Let's try again:
>>
>> [In case this also doesn't work for you, please use this git tree in which
>>  I have reverted the 2 old commits and added this updated patch.
>>
>>  git://github.com/srivatsabhat/linux.git ipi-offline-fix-v3
>> ]
> 
> Unfortunately the attached patch did not apply either. Nevertheless, I
> applied the
> patch from your above mentioned tree. With that patch I do not see the 
> warnings
> that I mentioned in my first mail. Thanks for fixing it.
> 

Sure, thanks for reporting the bug and testing the updated patch!
By the way, I think there is some problem in the workflow that you use to
copy-paste/apply the patch. I tried applying both patches (that I sent in
2 different mails) and both applied properly without any problems.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Boot warnings on exynos5420 based boards

2014-06-17 Thread Srivatsa S. Bhat
On 06/17/2014 03:03 PM, Sachin Kamat wrote:

>> Below is an updated patch, please let me know how it goes. (You'll have to
>> revert c47a9d7cca first, and then 56e692182, before trying this patch).
> 
> I am unable to apply your below patch on top of the above 2 reverts.
> Applying: CPU hotplug, smp: Execute any pending IPI callbacks before CPU 
> offline
> fatal: corrupt patch at line 106
> Patch failed at 0001 CPU hotplug, smp: Execute any pending IPI
> callbacks before CPU offline
> 
> Even with 'patch' I get the below failures:
> patching file kernel/smp.c
> Hunk #2 FAILED at 53.
> Hunk #3 FAILED at 179.
> 2 out of 3 hunks FAILED -- saving rejects to file kernel/smp.c.rej
> 

Hmm, weird. My mailer must have screwed it up.

Let's try again:

[In case this also doesn't work for you, please use this git tree in which
 I have reverted the 2 old commits and added this updated patch.

 git://github.com/srivatsabhat/linux.git ipi-offline-fix-v3
]

--------
From: Srivatsa S. Bhat 

[PATCH] CPU hotplug, smp: Execute any pending IPI callbacks before CPU offline

There is a race between the CPU offline code (within stop-machine) and the
smp-call-function code, which can lead to getting IPIs on the outgoing CPU,
*after* it has gone offline.

Specifically, this can happen when using smp_call_function_single_async() to
send the IPI, since this API allows sending asynchronous IPIs from IRQ
disabled contexts. The exact race condition is described below.

During CPU offline, in stop-machine, we don't enforce any rule in the
_DISABLE_IRQ stage, regarding the order in which the outgoing CPU and the
other CPUs disable their local interrupts. Due to this, we can encounter a
situation in which an IPI is sent by one of the other CPUs to the outgoing CPU
(while it is *still* online), but the outgoing CPU ends up noticing it only
*after* it has gone offline.

  CPU 1 CPU 2
  (Online CPU)   (CPU going offline)

   Enter _PREPARE stage  Enter _PREPARE stage

 Enter _DISABLE_IRQ stage

   =
   Got a device interrupt, and | Didn't notice the IPI
   the interrupt handler sent an   | since interrupts were
   IPI to CPU 2 using  | disabled on this CPU.
   smp_call_function_single_async()|
   =

   Enter _DISABLE_IRQ stage

   Enter _RUN stage  Enter _RUN stage

  =
   Busy loop with interrupts  |  Invoke take_cpu_down()
   disabled.  |  and take CPU 2 offline
  =

   Enter _EXIT stage Enter _EXIT stage

   Re-enable interrupts  Re-enable interrupts

 The pending IPI is noted
 immediately, but alas,
 the CPU is offline at
 this point.

This of course, makes the smp-call-function IPI handler code running on CPU 2
unhappy and it complains about "receiving an IPI on an offline CPU".
One real example of the scenario on CPU 1 is the block layer's
complete-request call-path:
__blk_complete_request() [interrupt-handler]
raise_blk_irq()
smp_call_function_single_async()


However, if we look closely, the block layer does check that the target CPU is
online before firing the IPI. So in this case, it is actually the unfortunate
ordering/timing of events in the stop-machine phase that leads to receiving
IPIs after the target CPU has gone offline.

In reality, getting a late IPI on an offline CPU is not too bad by itself
(this can happen even due to hardware latencies in IPI send-receive). It is
a bug only if the target CPU really went offline without executing all the
callbacks queued on its list. (Note that a CPU is free to execute its pending
smp-call-function callbacks in a batch, without waiting for the corresponding
IPIs to arrive for each one of those callbacks).

So, fixing this issue can be broken up into two parts:
1. Ensure that a CPU goes offline only after executing all the callbacks
   queued on it.
2. Modify the warning condition in the smp-call-function IPI handler code
   such that it warns only if an offline CPU got an IPI *and* that CPU had
   gone offline with callbacks still pending in its queue.

Achieving part 1 is straight-forward - just flush (execute) all the queued
callbacks on the outgoing CP

Re: Boot warnings on exynos5420 based boards

2014-06-17 Thread Srivatsa S. Bhat
Hi Sachin,

On 06/17/2014 01:39 PM, Sachin Kamat wrote:
> Hi,
> 
> I observe the below warnings while trying to boot Exynos5420 based boards
> since yesterday's linux-next (next-20140616) using multi_v7_defconfig. Looks

I guess you meant next-20140617.

> like it is triggered by the commit 56e6921829 ("CPU hotplug, smp:
> flush any pending IPI callbacks before CPU offline"). Any ideas?
> 
> 
> *
> [0.046521] Exynos MCPM support installed
> [0.048939] CPU1: Booted secondary processor
> [0.065005] CPU1: update cpu_capacity 1535
> [0.065011] CPU1: thread -1, cpu 1, socket 0, mpidr 8001
> [0.065660] CPU2: Booted secondary processor
> [0.085005] CPU2: update cpu_capacity 1535
> [0.085012] CPU2: thread -1, cpu 2, socket 0, mpidr 8002
> [0.085662] CPU3: Booted secondary processor
> [0.105005] CPU3: update cpu_capacity 1535
> [0.105011] CPU3: thread -1, cpu 3, socket 0, mpidr 8003
> [1.105031] CPU4: failed to come online
> [1.105081] [ cut here ]
> [1.105104] WARNING: CPU: 0 PID: 1 at kernel/smp.c:228
> flush_smp_call_function_queue+0xc0/0x178()
> [1.105112] Modules linked in:
> [1.105129] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> 3.15.0-next-20140616-2-g38f9385a061b #2035
> [1.105157] [] (unwind_backtrace) from []
> (show_stack+0x10/0x14)
> [1.105179] [] (show_stack) from []
> (dump_stack+0x8c/0x9c)
> [1.105198] [] (dump_stack) from []
> (warn_slowpath_common+0x70/0x8c)
> [1.105216] [] (warn_slowpath_common) from []
> (warn_slowpath_null+0x1c/0x24)
> [1.105235] [] (warn_slowpath_null) from []
> (flush_smp_call_function_queue+0xc0/0x178)
> [1.105253] [] (flush_smp_call_function_queue) from
> [] (hotplug_cfd+0x98/0xd8)
> [1.105269] [] (hotplug_cfd) from []
> (notifier_call_chain+0x44/0x84)
> [1.105285] [] (notifier_call_chain) from []
> (_cpu_up+0x120/0x170)
> [1.105302] [] (_cpu_up) from [] (cpu_up+0x70/0x94)
> [1.105319] [] (cpu_up) from [] (smp_init+0xac/0xb0)
> [1.105337] [] (smp_init) from []
> (kernel_init_freeable+0x90/0x1dc)
> [1.105353] [] (kernel_init_freeable) from []
> (kernel_init+0xc/0xe8)
> [1.105368] [] (kernel_init) from []
> (ret_from_fork+0x14/0x3c)
> [1.105389] ---[ end trace bc66942e4ab63168 ]---

Argh! I had put the switch-case handling for CPU_DYING at the 'wrong' place,
since I hadn't noticed that CPU_UP_CANCELED silently falls-through to CPU_DEAD.
This is what happens when people don't explicitly write "fall-through" in the
comments in a switch-case statement :-(

Below is an updated patch, please let me know how it goes. (You'll have to
revert c47a9d7cca first, and then 56e692182, before trying this patch).

[c47a9d7cca - CPU hotplug, smp: Execute any pending IPI callbacks before CPU
  offline]
[56e692182  - CPU hotplug, smp: flush any pending IPI callbacks before CPU
      offline]

Andrew, can you please use this patch instead?

Thanks a lot!

--

From: Srivatsa S. Bhat 
[PATCH] CPU hotplug, smp: Execute any pending IPI callbacks before CPU offline

There is a race between the CPU offline code (within stop-machine) and the
smp-call-function code, which can lead to getting IPIs on the outgoing CPU,
*after* it has gone offline.

Specifically, this can happen when using smp_call_function_single_async() to
send the IPI, since this API allows sending asynchronous IPIs from IRQ
disabled contexts. The exact race condition is described below.

During CPU offline, in stop-machine, we don't enforce any rule in the
_DISABLE_IRQ stage, regarding the order in which the outgoing CPU and the
other CPUs disable their local interrupts. Due to this, we can encounter a
situation in which an IPI is sent by one of the other CPUs to the outgoing CPU
(while it is *still* online), but the outgoing CPU ends up noticing it only
*after* it has gone offline.

  CPU 1 CPU 2
  (Online CPU)   (CPU going offline)

   Enter _PREPARE stage  Enter _PREPARE stage

 Enter _DISABLE_IRQ stage

   =
   Got a device interrupt, and | Didn't notice the IPI
   the interrupt handler sent an   | since interrupts were
   IPI to CPU 2 using  | disabled on this CPU.
   smp_call_function_single_async()|
   =

   Enter _DISABLE_IRQ stage

   Enter _RUN stage  Enter _RUN stage

  

Re: Boot warnings on exynos5420 based boards

2014-06-17 Thread Srivatsa S. Bhat
Hi Sachin,

On 06/17/2014 01:39 PM, Sachin Kamat wrote:
 Hi,
 
 I observe the below warnings while trying to boot Exynos5420 based boards
 since yesterday's linux-next (next-20140616) using multi_v7_defconfig. Looks

I guess you meant next-20140617.

 like it is triggered by the commit 56e6921829 (CPU hotplug, smp:
 flush any pending IPI callbacks before CPU offline). Any ideas?
 
 
 *
 [0.046521] Exynos MCPM support installed
 [0.048939] CPU1: Booted secondary processor
 [0.065005] CPU1: update cpu_capacity 1535
 [0.065011] CPU1: thread -1, cpu 1, socket 0, mpidr 8001
 [0.065660] CPU2: Booted secondary processor
 [0.085005] CPU2: update cpu_capacity 1535
 [0.085012] CPU2: thread -1, cpu 2, socket 0, mpidr 8002
 [0.085662] CPU3: Booted secondary processor
 [0.105005] CPU3: update cpu_capacity 1535
 [0.105011] CPU3: thread -1, cpu 3, socket 0, mpidr 8003
 [1.105031] CPU4: failed to come online
 [1.105081] [ cut here ]
 [1.105104] WARNING: CPU: 0 PID: 1 at kernel/smp.c:228
 flush_smp_call_function_queue+0xc0/0x178()
 [1.105112] Modules linked in:
 [1.105129] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
 3.15.0-next-20140616-2-g38f9385a061b #2035
 [1.105157] [c02160f0] (unwind_backtrace) from [c0211c8c]
 (show_stack+0x10/0x14)
 [1.105179] [c0211c8c] (show_stack) from [c0853794]
 (dump_stack+0x8c/0x9c)
 [1.105198] [c0853794] (dump_stack) from [c024bdf4]
 (warn_slowpath_common+0x70/0x8c)
 [1.105216] [c024bdf4] (warn_slowpath_common) from [c024beac]
 (warn_slowpath_null+0x1c/0x24)
 [1.105235] [c024beac] (warn_slowpath_null) from [c02a3944]
 (flush_smp_call_function_queue+0xc0/0x178)
 [1.105253] [c02a3944] (flush_smp_call_function_queue) from
 [c02a3a94] (hotplug_cfd+0x98/0xd8)
 [1.105269] [c02a3a94] (hotplug_cfd) from [c026b064]
 (notifier_call_chain+0x44/0x84)
 [1.105285] [c026b064] (notifier_call_chain) from [c024c1a4]
 (_cpu_up+0x120/0x170)
 [1.105302] [c024c1a4] (_cpu_up) from [c024c264] (cpu_up+0x70/0x94)
 [1.105319] [c024c264] (cpu_up) from [c0b5839c] (smp_init+0xac/0xb0)
 [1.105337] [c0b5839c] (smp_init) from [c0b2fc54]
 (kernel_init_freeable+0x90/0x1dc)
 [1.105353] [c0b2fc54] (kernel_init_freeable) from [c0851248]
 (kernel_init+0xc/0xe8)
 [1.105368] [c0851248] (kernel_init) from [c020e7f8]
 (ret_from_fork+0x14/0x3c)
 [1.105389] ---[ end trace bc66942e4ab63168 ]---

Argh! I had put the switch-case handling for CPU_DYING at the 'wrong' place,
since I hadn't noticed that CPU_UP_CANCELED silently falls-through to CPU_DEAD.
This is what happens when people don't explicitly write fall-through in the
comments in a switch-case statement :-(

Below is an updated patch, please let me know how it goes. (You'll have to
revert c47a9d7cca first, and then 56e692182, before trying this patch).

[c47a9d7cca - CPU hotplug, smp: Execute any pending IPI callbacks before CPU
  offline]
[56e692182  - CPU hotplug, smp: flush any pending IPI callbacks before CPU
  offline]

Andrew, can you please use this patch instead?

Thanks a lot!

--

From: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
[PATCH] CPU hotplug, smp: Execute any pending IPI callbacks before CPU offline

There is a race between the CPU offline code (within stop-machine) and the
smp-call-function code, which can lead to getting IPIs on the outgoing CPU,
*after* it has gone offline.

Specifically, this can happen when using smp_call_function_single_async() to
send the IPI, since this API allows sending asynchronous IPIs from IRQ
disabled contexts. The exact race condition is described below.

During CPU offline, in stop-machine, we don't enforce any rule in the
_DISABLE_IRQ stage, regarding the order in which the outgoing CPU and the
other CPUs disable their local interrupts. Due to this, we can encounter a
situation in which an IPI is sent by one of the other CPUs to the outgoing CPU
(while it is *still* online), but the outgoing CPU ends up noticing it only
*after* it has gone offline.

  CPU 1 CPU 2
  (Online CPU)   (CPU going offline)

   Enter _PREPARE stage  Enter _PREPARE stage

 Enter _DISABLE_IRQ stage

   =
   Got a device interrupt, and | Didn't notice the IPI
   the interrupt handler sent an   | since interrupts were
   IPI to CPU 2 using  | disabled on this CPU.
   smp_call_function_single_async()|
   =

   Enter _DISABLE_IRQ stage

   Enter _RUN stage  Enter _RUN

Re: Boot warnings on exynos5420 based boards

2014-06-17 Thread Srivatsa S. Bhat
On 06/17/2014 03:03 PM, Sachin Kamat wrote:

 Below is an updated patch, please let me know how it goes. (You'll have to
 revert c47a9d7cca first, and then 56e692182, before trying this patch).
 
 I am unable to apply your below patch on top of the above 2 reverts.
 Applying: CPU hotplug, smp: Execute any pending IPI callbacks before CPU 
 offline
 fatal: corrupt patch at line 106
 Patch failed at 0001 CPU hotplug, smp: Execute any pending IPI
 callbacks before CPU offline
 
 Even with 'patch' I get the below failures:
 patching file kernel/smp.c
 Hunk #2 FAILED at 53.
 Hunk #3 FAILED at 179.
 2 out of 3 hunks FAILED -- saving rejects to file kernel/smp.c.rej
 

Hmm, weird. My mailer must have screwed it up.

Let's try again:

[In case this also doesn't work for you, please use this git tree in which
 I have reverted the 2 old commits and added this updated patch.

 git://github.com/srivatsabhat/linux.git ipi-offline-fix-v3
]


From: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com

[PATCH] CPU hotplug, smp: Execute any pending IPI callbacks before CPU offline

There is a race between the CPU offline code (within stop-machine) and the
smp-call-function code, which can lead to getting IPIs on the outgoing CPU,
*after* it has gone offline.

Specifically, this can happen when using smp_call_function_single_async() to
send the IPI, since this API allows sending asynchronous IPIs from IRQ
disabled contexts. The exact race condition is described below.

During CPU offline, in stop-machine, we don't enforce any rule in the
_DISABLE_IRQ stage, regarding the order in which the outgoing CPU and the
other CPUs disable their local interrupts. Due to this, we can encounter a
situation in which an IPI is sent by one of the other CPUs to the outgoing CPU
(while it is *still* online), but the outgoing CPU ends up noticing it only
*after* it has gone offline.

  CPU 1 CPU 2
  (Online CPU)   (CPU going offline)

   Enter _PREPARE stage  Enter _PREPARE stage

 Enter _DISABLE_IRQ stage

   =
   Got a device interrupt, and | Didn't notice the IPI
   the interrupt handler sent an   | since interrupts were
   IPI to CPU 2 using  | disabled on this CPU.
   smp_call_function_single_async()|
   =

   Enter _DISABLE_IRQ stage

   Enter _RUN stage  Enter _RUN stage

  =
   Busy loop with interrupts  |  Invoke take_cpu_down()
   disabled.  |  and take CPU 2 offline
  =

   Enter _EXIT stage Enter _EXIT stage

   Re-enable interrupts  Re-enable interrupts

 The pending IPI is noted
 immediately, but alas,
 the CPU is offline at
 this point.

This of course, makes the smp-call-function IPI handler code running on CPU 2
unhappy and it complains about receiving an IPI on an offline CPU.
One real example of the scenario on CPU 1 is the block layer's
complete-request call-path:
__blk_complete_request() [interrupt-handler]
raise_blk_irq()
smp_call_function_single_async()


However, if we look closely, the block layer does check that the target CPU is
online before firing the IPI. So in this case, it is actually the unfortunate
ordering/timing of events in the stop-machine phase that leads to receiving
IPIs after the target CPU has gone offline.

In reality, getting a late IPI on an offline CPU is not too bad by itself
(this can happen even due to hardware latencies in IPI send-receive). It is
a bug only if the target CPU really went offline without executing all the
callbacks queued on its list. (Note that a CPU is free to execute its pending
smp-call-function callbacks in a batch, without waiting for the corresponding
IPIs to arrive for each one of those callbacks).

So, fixing this issue can be broken up into two parts:
1. Ensure that a CPU goes offline only after executing all the callbacks
   queued on it.
2. Modify the warning condition in the smp-call-function IPI handler code
   such that it warns only if an offline CPU got an IPI *and* that CPU had
   gone offline with callbacks still pending in its queue.

Achieving part 1 is straight-forward - just flush (execute) all the queued
callbacks on the outgoing CPU in the CPU_DYING stage[1], including those
callbacks

Re: Boot warnings on exynos5420 based boards

2014-06-17 Thread Srivatsa S. Bhat
On 06/17/2014 05:25 PM, Sachin Kamat wrote:
 Hi Srivatsa,
 
 On Tue, Jun 17, 2014 at 3:24 PM, Srivatsa S. Bhat
 srivatsa.b...@linux.vnet.ibm.com wrote:
 On 06/17/2014 03:03 PM, Sachin Kamat wrote:

 Below is an updated patch, please let me know how it goes. (You'll have to
 revert c47a9d7cca first, and then 56e692182, before trying this patch).

 I am unable to apply your below patch on top of the above 2 reverts.
 Applying: CPU hotplug, smp: Execute any pending IPI callbacks before CPU 
 offline
 fatal: corrupt patch at line 106
 Patch failed at 0001 CPU hotplug, smp: Execute any pending IPI
 callbacks before CPU offline

 Even with 'patch' I get the below failures:
 patching file kernel/smp.c
 Hunk #2 FAILED at 53.
 Hunk #3 FAILED at 179.
 2 out of 3 hunks FAILED -- saving rejects to file kernel/smp.c.rej


 Hmm, weird. My mailer must have screwed it up.

 Let's try again:

 [In case this also doesn't work for you, please use this git tree in which
  I have reverted the 2 old commits and added this updated patch.

  git://github.com/srivatsabhat/linux.git ipi-offline-fix-v3
 ]
 
 Unfortunately the attached patch did not apply either. Nevertheless, I
 applied the
 patch from your above mentioned tree. With that patch I do not see the 
 warnings
 that I mentioned in my first mail. Thanks for fixing it.
 

Sure, thanks for reporting the bug and testing the updated patch!
By the way, I think there is some problem in the workflow that you use to
copy-paste/apply the patch. I tried applying both patches (that I sent in
2 different mails) and both applied properly without any problems.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CPU hotplug, smp] WARNING: CPU: 1 PID: 0 at kernel/smp.c:209 generic_smp_call_function_single_interrupt()

2014-06-15 Thread Srivatsa S. Bhat
On 06/15/2014 12:56 PM, Jet Chen wrote:
> Hi Srivatsa,
> 
> 0day kernel testing robot got the below dmesg and the first bad commit is
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> commit ab7a42783d939cdbe729c18ab32dbf0d25746ea2
> Author: Srivatsa S. Bhat 
> AuthorDate: Thu May 22 10:44:06 2014 +1000
> Commit: Stephen Rothwell 
> CommitDate: Thu May 22 10:44:06 2014 +1000
> 
> CPU hotplug, smp: Flush any pending IPI callbacks before CPU offline
> During CPU offline, in the stop-machine loop, we use 2 separate 
> stages to
> disable interrupts, to ensure that the CPU going offline doesn't get any
> new IPIs from the other CPUs after it has gone offline.
> However, an IPI sent much earlier might arrive late on the target CPU
> (possibly _after_ the CPU has gone offline) due to hardware latencies, and
> due to this, the smp-call-function callbacks queued on the outgoing CPU
> might not get noticed (and hence not executed) at all.
> This is somewhat theoretical, but in any case, it makes sense to
> explicitly loop through the call_single_queue and flush any pending
> callbacks before the CPU goes completely offline.  So, flush the queued
> smp-call-function callbacks in the MULTI_STOP_DISABLE_IRQ_ACTIVE stage,
> after disabling interrupts on the active CPU.  This can be trivially
> achieved by invoking the generic_smp_call_function_single_interrupt()
> function itself (and since the outgoing CPU is still online at this point,
> we won't trigger the "IPI to offline CPU" warning in this function; so we
> are safe to call it here).
> This way, we would have handled all the queued callbacks before going
> offline, and also, no new IPIs can be sent by the other CPUs to the
> outgoing CPU at that point, because they will all be executing the
> stop-machine code with interrupts disabled.
> Signed-off-by: Srivatsa S. Bhat 
> Suggested-by: Frederic Weisbecker 
> Reviewed-by: Tejun Heo 
> Cc: Peter Zijlstra 
> Cc: Oleg Nesterov 
> Signed-off-by: Andrew Morton 
> 

Thanks for reporting this, but this patch has been superseded by an updated
version of the patch and you can find that here:
https://lkml.org/lkml/2014/6/10/589

Also, this (bad) patch (which you bisected to) is not in linux-next at the
moment. So I guess you tested a slightly old version of linux-next.

Thank you!

Regards,
Srivatsa S. Bhat

> 
> +--+++
> | 
>  | c80d40e1f2 | ab7a42783d |
> +--+++
> | boot_successes  
>  | 1194   | 261|
> | boot_failures   
>  | 6  | 39 |
> | BUG:kernel_test_crashed 
>  | 6  ||
> | 
> WARNING:CPU:PID:at_kernel/smp.c:generic_smp_call_function_single_interrupt() 
> | 0  | 39 |
> | backtrace:stop_machine_from_inactive_cpu
>  | 0  | 39 |
> | backtrace:mtrr_ap_init  
>  | 0  | 39 |
> | general_protection_fault
>  | 0  | 0  |
> | RIP:__lock_acquire  
>  | 0  | 0  |
> | BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/rwsem.c 
>  | 0  | 0  |
> | INFO:lockdep_is_turned_off  
>  | 0  | 0  |
> | backtrace:do_mount  
>  | 0  | 0  |
> | backtrace:SyS_mount 
>  | 0  | 0  |
> +--+++
> 
> [   62.119017] masked ExtINT on CPU#1
> [   62.119017] numa_add_cpu cpu 1 node 0: mask now 0-1
> [   62.261606] [ cut here ]
> [   62.261606] WARNING: CPU: 1 PID: 0 at kernel/smp.c:209 
> generic_smp_call_function_single_interrupt+0xc5/0x155()
> [   62.261606] IPI on offline CPU 1
> [   62.261606] Modules linked in:
> [   62.261606] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 
> 3.15.0-rc5-01473-gab7a427 #4510
> [ 

Re: [CPU hotplug, smp] WARNING: CPU: 1 PID: 0 at kernel/smp.c:209 generic_smp_call_function_single_interrupt()

2014-06-15 Thread Srivatsa S. Bhat
On 06/15/2014 12:56 PM, Jet Chen wrote:
 Hi Srivatsa,
 
 0day kernel testing robot got the below dmesg and the first bad commit is
 
 git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
 commit ab7a42783d939cdbe729c18ab32dbf0d25746ea2
 Author: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
 AuthorDate: Thu May 22 10:44:06 2014 +1000
 Commit: Stephen Rothwell s...@canb.auug.org.au
 CommitDate: Thu May 22 10:44:06 2014 +1000
 
 CPU hotplug, smp: Flush any pending IPI callbacks before CPU offline
 During CPU offline, in the stop-machine loop, we use 2 separate 
 stages to
 disable interrupts, to ensure that the CPU going offline doesn't get any
 new IPIs from the other CPUs after it has gone offline.
 However, an IPI sent much earlier might arrive late on the target CPU
 (possibly _after_ the CPU has gone offline) due to hardware latencies, and
 due to this, the smp-call-function callbacks queued on the outgoing CPU
 might not get noticed (and hence not executed) at all.
 This is somewhat theoretical, but in any case, it makes sense to
 explicitly loop through the call_single_queue and flush any pending
 callbacks before the CPU goes completely offline.  So, flush the queued
 smp-call-function callbacks in the MULTI_STOP_DISABLE_IRQ_ACTIVE stage,
 after disabling interrupts on the active CPU.  This can be trivially
 achieved by invoking the generic_smp_call_function_single_interrupt()
 function itself (and since the outgoing CPU is still online at this point,
 we won't trigger the IPI to offline CPU warning in this function; so we
 are safe to call it here).
 This way, we would have handled all the queued callbacks before going
 offline, and also, no new IPIs can be sent by the other CPUs to the
 outgoing CPU at that point, because they will all be executing the
 stop-machine code with interrupts disabled.
 Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
 Suggested-by: Frederic Weisbecker fweis...@gmail.com
 Reviewed-by: Tejun Heo t...@kernel.org
 Cc: Peter Zijlstra pet...@infradead.org
 Cc: Oleg Nesterov o...@redhat.com
 Signed-off-by: Andrew Morton a...@linux-foundation.org
 

Thanks for reporting this, but this patch has been superseded by an updated
version of the patch and you can find that here:
https://lkml.org/lkml/2014/6/10/589

Also, this (bad) patch (which you bisected to) is not in linux-next at the
moment. So I guess you tested a slightly old version of linux-next.

Thank you!

Regards,
Srivatsa S. Bhat

 
 +--+++
 | 
  | c80d40e1f2 | ab7a42783d |
 +--+++
 | boot_successes  
  | 1194   | 261|
 | boot_failures   
  | 6  | 39 |
 | BUG:kernel_test_crashed 
  | 6  ||
 | 
 WARNING:CPU:PID:at_kernel/smp.c:generic_smp_call_function_single_interrupt() 
 | 0  | 39 |
 | backtrace:stop_machine_from_inactive_cpu
  | 0  | 39 |
 | backtrace:mtrr_ap_init  
  | 0  | 39 |
 | general_protection_fault
  | 0  | 0  |
 | RIP:__lock_acquire  
  | 0  | 0  |
 | BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/rwsem.c 
  | 0  | 0  |
 | INFO:lockdep_is_turned_off  
  | 0  | 0  |
 | backtrace:do_mount  
  | 0  | 0  |
 | backtrace:SyS_mount 
  | 0  | 0  |
 +--+++
 
 [   62.119017] masked ExtINT on CPU#1
 [   62.119017] numa_add_cpu cpu 1 node 0: mask now 0-1
 [   62.261606] [ cut here ]
 [   62.261606] WARNING: CPU: 1 PID: 0 at kernel/smp.c:209 
 generic_smp_call_function_single_interrupt+0xc5/0x155()
 [   62.261606] IPI on offline CPU 1
 [   62.261606] Modules linked in:
 [   62.261606] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 
 3.15.0-rc5-01473-gab7a427 #4510
 [   62.261606] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 [   62.261606]   88001e7e1d28 81b01705 
 88001e7e1d70
 [   62.261606

Re: [PATCH] powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode

2014-06-12 Thread Srivatsa S. Bhat
Hi Joel,

On 06/12/2014 12:09 PM, Joel Stanley wrote:
> Hi Srivatsa,
> 
> On Sat, Jun 7, 2014 at 7:16 AM, Srivatsa S. Bhat
>  wrote:
>> And with the following hunk added (which I had forgotten earlier), it worked 
>> just
>> fine on powernv :-)
> 
> How are the patches coming along?
> 

I'm still waiting to test this patch series on a PowerVM box, and unfortunately
there are some machine issues to debug first :-( So that's why this is taking
time... :-(

> I just hung a machine here while attempting to kexec. It appears to
> have onlined all of the secondary threads, and then hung here:
> 
> kexec: Waking offline cpu 1.
> kvm: enabling virtualization on CPU1
> kexec: Waking offline cpu 2.
> kvm: enabling virtualization on CPU2
> kexec: Waking offline cpu 3.
> kvm: enabling virtualization on CPU3
> kexec: Waking offline cpu 5.
> kvm: enabling virtualization on CPU5
> [...]
> kvm: enabling virtualization on CPU63
> kexec: waiting for cpu 1 (physical 1) to enter OPAL
> kexec: waiting for cpu 2 (physical 2) to enter OPAL
> kexec: waiting for cpu 3 (physical 3) to enter OPAL
> 
> I'm running benh's next branch as of thismorning, and SMT was off.
> 

Oh! This looks like a different hang than the one I tried to fix. My patch
("powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode")
which is already in benh's next branch was aimed at fixing the "CPU is stuck"
issue which was observed during the second kernel boot. If the first kernel
itself is hanging in the down-path, then it looks like a different problem
altogether.

> Could you please post your latest patches a series? I will test them here.
> 

The 4 patches that I proposed in this thread are aimed at making the above
solution more elegant, by not having to actually online the secondary threads
while doing kexec. I don't think it will solve the hang that you are seeing.
In any case, I'll provide the consolidated patch below if you want to give it
a try.

By the way, I have a few questions regarding the hang you observed: is it
always reproducible with SMT=off? And if SMT was 8 (i.e, all CPUs in the system
were online) and then you did a kexec, do you still see the hang?

Regards,
Srivatsa S. Bhat

---

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index 16d7e33..2a31b52 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -68,6 +68,7 @@ static inline void crash_setup_regs(struct pt_regs *newregs,
ppc_save_regs(newregs);
 }
 
+extern bool kexec_cpu_wake(void);
 extern void kexec_smp_wait(void);  /* get and clear naca physid, wait for
  master to copy new code to 0 */
 extern int crashing_cpu;
diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index f92b0b5..39f721d 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -255,6 +255,16 @@ struct machdep_calls {
void (*machine_shutdown)(void);
 
 #ifdef CONFIG_KEXEC
+#if (defined CONFIG_PPC64) && (defined CONFIG_PPC_BOOK3S)
+
+   /*
+* The pseries and powernv book3s platforms have a special requirement
+* that soft-offline CPUs have to be woken up before kexec, to avoid
+* CPUs getting stuck. This callback prepares the system for the
+* impending wakeup of the offline CPUs.
+*/
+   void (*kexec_wake_prepare)(void);
+#endif
void (*kexec_cpu_down)(int crash_shutdown, int secondary);
 
/* Called to do what every setup is needed on image and the
diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
b/arch/powerpc/kernel/machine_kexec_64.c
index 879b3aa..84e91293 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -182,6 +182,14 @@ static void kexec_smp_down(void *arg)
/* NOTREACHED */
 }
 
+bool kexec_cpu_wake(void)
+{
+   kexec_smp_down(NULL);
+
+   /* NOTREACHED */
+   return true;
+}
+
 static void kexec_prepare_cpus_wait(int wait_state)
 {
int my_cpu, i, notified=-1;
@@ -202,7 +210,7 @@ static void kexec_prepare_cpus_wait(int wait_state)
 * these possible-but-not-online-but-should-be CPUs and chaperone them
 * into kexec_smp_wait().
 */
-   for_each_online_cpu(i) {
+   for_each_present_cpu(i) {
if (i == my_cpu)
continue;
 
@@ -228,16 +236,22 @@ static void kexec_prepare_cpus_wait(int wait_state)
  * threads as offline -- and again, these CPUs will be stuck.
  *
  * So, we online all CPUs that should be running, including secondary threads.
+ *
+ * TODO: Update this comment
  */
 static void wake_offline_cpus(void)
 {
int cpu = 0;
 
+   if (ppc_m

Re: [PATCH] powerpc, kexec: Fix Processor X is stuck issue during kexec from ST mode

2014-06-12 Thread Srivatsa S. Bhat
Hi Joel,

On 06/12/2014 12:09 PM, Joel Stanley wrote:
 Hi Srivatsa,
 
 On Sat, Jun 7, 2014 at 7:16 AM, Srivatsa S. Bhat
 srivatsa.b...@linux.vnet.ibm.com wrote:
 And with the following hunk added (which I had forgotten earlier), it worked 
 just
 fine on powernv :-)
 
 How are the patches coming along?
 

I'm still waiting to test this patch series on a PowerVM box, and unfortunately
there are some machine issues to debug first :-( So that's why this is taking
time... :-(

 I just hung a machine here while attempting to kexec. It appears to
 have onlined all of the secondary threads, and then hung here:
 
 kexec: Waking offline cpu 1.
 kvm: enabling virtualization on CPU1
 kexec: Waking offline cpu 2.
 kvm: enabling virtualization on CPU2
 kexec: Waking offline cpu 3.
 kvm: enabling virtualization on CPU3
 kexec: Waking offline cpu 5.
 kvm: enabling virtualization on CPU5
 [...]
 kvm: enabling virtualization on CPU63
 kexec: waiting for cpu 1 (physical 1) to enter OPAL
 kexec: waiting for cpu 2 (physical 2) to enter OPAL
 kexec: waiting for cpu 3 (physical 3) to enter OPAL
 
 I'm running benh's next branch as of thismorning, and SMT was off.
 

Oh! This looks like a different hang than the one I tried to fix. My patch
(powerpc, kexec: Fix Processor X is stuck issue during kexec from ST mode)
which is already in benh's next branch was aimed at fixing the CPU is stuck
issue which was observed during the second kernel boot. If the first kernel
itself is hanging in the down-path, then it looks like a different problem
altogether.

 Could you please post your latest patches a series? I will test them here.
 

The 4 patches that I proposed in this thread are aimed at making the above
solution more elegant, by not having to actually online the secondary threads
while doing kexec. I don't think it will solve the hang that you are seeing.
In any case, I'll provide the consolidated patch below if you want to give it
a try.

By the way, I have a few questions regarding the hang you observed: is it
always reproducible with SMT=off? And if SMT was 8 (i.e, all CPUs in the system
were online) and then you did a kexec, do you still see the hang?

Regards,
Srivatsa S. Bhat

---

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index 16d7e33..2a31b52 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -68,6 +68,7 @@ static inline void crash_setup_regs(struct pt_regs *newregs,
ppc_save_regs(newregs);
 }
 
+extern bool kexec_cpu_wake(void);
 extern void kexec_smp_wait(void);  /* get and clear naca physid, wait for
  master to copy new code to 0 */
 extern int crashing_cpu;
diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index f92b0b5..39f721d 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -255,6 +255,16 @@ struct machdep_calls {
void (*machine_shutdown)(void);
 
 #ifdef CONFIG_KEXEC
+#if (defined CONFIG_PPC64)  (defined CONFIG_PPC_BOOK3S)
+
+   /*
+* The pseries and powernv book3s platforms have a special requirement
+* that soft-offline CPUs have to be woken up before kexec, to avoid
+* CPUs getting stuck. This callback prepares the system for the
+* impending wakeup of the offline CPUs.
+*/
+   void (*kexec_wake_prepare)(void);
+#endif
void (*kexec_cpu_down)(int crash_shutdown, int secondary);
 
/* Called to do what every setup is needed on image and the
diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
b/arch/powerpc/kernel/machine_kexec_64.c
index 879b3aa..84e91293 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -182,6 +182,14 @@ static void kexec_smp_down(void *arg)
/* NOTREACHED */
 }
 
+bool kexec_cpu_wake(void)
+{
+   kexec_smp_down(NULL);
+
+   /* NOTREACHED */
+   return true;
+}
+
 static void kexec_prepare_cpus_wait(int wait_state)
 {
int my_cpu, i, notified=-1;
@@ -202,7 +210,7 @@ static void kexec_prepare_cpus_wait(int wait_state)
 * these possible-but-not-online-but-should-be CPUs and chaperone them
 * into kexec_smp_wait().
 */
-   for_each_online_cpu(i) {
+   for_each_present_cpu(i) {
if (i == my_cpu)
continue;
 
@@ -228,16 +236,22 @@ static void kexec_prepare_cpus_wait(int wait_state)
  * threads as offline -- and again, these CPUs will be stuck.
  *
  * So, we online all CPUs that should be running, including secondary threads.
+ *
+ * TODO: Update this comment
  */
 static void wake_offline_cpus(void)
 {
int cpu = 0;
 
+   if (ppc_md.kexec_wake_prepare)
+   ppc_md.kexec_wake_prepare();
+
for_each_present_cpu(cpu) {
if (!cpu_online(cpu

[PATCH v2] CPU hotplug, smp: Execute any pending IPI callbacks before CPU offline

2014-06-10 Thread Srivatsa S. Bhat
ak any existing code;
hence lets go with the solution proposed above until that is done).

[a...@linux-foundation.org: coding-style fixes]
Signed-off-by: Srivatsa S. Bhat 
Suggested-by: Frederic Weisbecker 
Cc: "Paul E. McKenney" 
Cc: Borislav Petkov 
Cc: Christoph Hellwig 
Cc: Frederic Weisbecker 
Cc: Gautham R Shenoy 
Cc: Ingo Molnar 
Cc: Mel Gorman 
Cc: Mike Galbraith 
Cc: Oleg Nesterov 
Cc: Peter Zijlstra 
Cc: Rafael J. Wysocki 
Cc: Rik van Riel 
Cc: Rusty Russell 
Cc: Srivatsa S. Bhat 
Cc: Steven Rostedt 
Cc: Tejun Heo 
Cc: Thomas Gleixner 
---

Changes in v2:

* Modified the changelog to make it more accurate and easy to understand,
  based on feedback by Peter Zijlstra.
* Replaced the term "IPI functions" with "IPI callbacks" in the code
  comments.
* Absolutely no changes in the code.

 kernel/smp.c |   56 
 1 file changed, 48 insertions(+), 8 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 306f818..ef3941d 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -29,6 +29,8 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct 
call_function_data, cfd_data);
 
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct llist_head, call_single_queue);
 
+static void flush_smp_call_function_queue(bool warn_cpu_offline);
+
 static int
 hotplug_cfd(struct notifier_block *nfb, unsigned long action, void *hcpu)
 {
@@ -52,6 +54,20 @@ hotplug_cfd(struct notifier_block *nfb, unsigned long 
action, void *hcpu)
case CPU_UP_CANCELED:
case CPU_UP_CANCELED_FROZEN:
 
+   case CPU_DYING:
+   case CPU_DYING_FROZEN:
+   /*
+* The IPIs for the smp-call-function callbacks queued by other
+* CPUs might arrive late, either due to hardware latencies or
+* because this CPU disabled interrupts (inside stop-machine)
+* before the IPIs were sent. So flush out any pending callbacks
+* explicitly (without waiting for the IPIs to arrive), to
+* ensure that the outgoing CPU doesn't go offline with work
+* still pending.
+*/
+   flush_smp_call_function_queue(false);
+   break;
+
case CPU_DEAD:
case CPU_DEAD_FROZEN:
free_cpumask_var(cfd->cpumask);
@@ -177,23 +193,47 @@ static int generic_exec_single(int cpu, struct 
call_single_data *csd,
return 0;
 }
 
-/*
- * Invoked by arch to handle an IPI for call function single. Must be
- * called from the arch with interrupts disabled.
+/**
+ * generic_smp_call_function_single_interrupt - Execute SMP IPI callbacks
+ *
+ * Invoked by arch to handle an IPI for call function single.
+ * Must be called with interrupts disabled.
  */
 void generic_smp_call_function_single_interrupt(void)
 {
+   flush_smp_call_function_queue(true);
+}
+
+/**
+ * flush_smp_call_function_queue - Flush pending smp-call-function callbacks
+ *
+ * @warn_cpu_offline: If set to 'true', warn if callbacks were queued on an
+ *   offline CPU. Skip this check if set to 'false'.
+ *
+ * Flush any pending smp-call-function callbacks queued on this CPU. This is
+ * invoked by the generic IPI handler, as well as by a CPU about to go offline,
+ * to ensure that all pending IPI callbacks are run before it goes completely
+ * offline.
+ *
+ * Loop through the call_single_queue and run all the queued callbacks.
+ * Must be called with interrupts disabled.
+ */
+static void flush_smp_call_function_queue(bool warn_cpu_offline)
+{
+   struct llist_head *head;
struct llist_node *entry;
struct call_single_data *csd, *csd_next;
static bool warned;
 
-   entry = llist_del_all(&__get_cpu_var(call_single_queue));
+   WARN_ON(!irqs_disabled());
+
+   head = &__get_cpu_var(call_single_queue);
+   entry = llist_del_all(head);
entry = llist_reverse_order(entry);
 
-   /*
-* Shouldn't receive this interrupt on a cpu that is not yet online.
-*/
-   if (unlikely(!cpu_online(smp_processor_id()) && !warned)) {
+   /* There shouldn't be any pending callbacks on an offline CPU. */
+   if (unlikely(warn_cpu_offline && !cpu_online(smp_processor_id()) &&
+!warned && !llist_empty(head))) {
warned = true;
WARN(1, "IPI on offline CPU %d\n", smp_processor_id());
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] CPU hotplug, smp: Execute any pending IPI callbacks before CPU offline

2014-06-10 Thread Srivatsa S. Bhat
;
hence lets go with the solution proposed above until that is done).

[a...@linux-foundation.org: coding-style fixes]
Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
Suggested-by: Frederic Weisbecker fweis...@gmail.com
Cc: Paul E. McKenney paul...@linux.vnet.ibm.com
Cc: Borislav Petkov b...@suse.de
Cc: Christoph Hellwig h...@infradead.org
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Gautham R Shenoy e...@linux.vnet.ibm.com
Cc: Ingo Molnar mi...@kernel.org
Cc: Mel Gorman mgor...@suse.de
Cc: Mike Galbraith mgalbra...@suse.de
Cc: Oleg Nesterov o...@redhat.com
Cc: Peter Zijlstra pet...@infradead.org
Cc: Rafael J. Wysocki r...@rjwysocki.net
Cc: Rik van Riel r...@redhat.com
Cc: Rusty Russell ru...@rustcorp.com.au
Cc: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Tejun Heo t...@kernel.org
Cc: Thomas Gleixner t...@linutronix.de
---

Changes in v2:

* Modified the changelog to make it more accurate and easy to understand,
  based on feedback by Peter Zijlstra.
* Replaced the term IPI functions with IPI callbacks in the code
  comments.
* Absolutely no changes in the code.

 kernel/smp.c |   56 
 1 file changed, 48 insertions(+), 8 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 306f818..ef3941d 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -29,6 +29,8 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct 
call_function_data, cfd_data);
 
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct llist_head, call_single_queue);
 
+static void flush_smp_call_function_queue(bool warn_cpu_offline);
+
 static int
 hotplug_cfd(struct notifier_block *nfb, unsigned long action, void *hcpu)
 {
@@ -52,6 +54,20 @@ hotplug_cfd(struct notifier_block *nfb, unsigned long 
action, void *hcpu)
case CPU_UP_CANCELED:
case CPU_UP_CANCELED_FROZEN:
 
+   case CPU_DYING:
+   case CPU_DYING_FROZEN:
+   /*
+* The IPIs for the smp-call-function callbacks queued by other
+* CPUs might arrive late, either due to hardware latencies or
+* because this CPU disabled interrupts (inside stop-machine)
+* before the IPIs were sent. So flush out any pending callbacks
+* explicitly (without waiting for the IPIs to arrive), to
+* ensure that the outgoing CPU doesn't go offline with work
+* still pending.
+*/
+   flush_smp_call_function_queue(false);
+   break;
+
case CPU_DEAD:
case CPU_DEAD_FROZEN:
free_cpumask_var(cfd-cpumask);
@@ -177,23 +193,47 @@ static int generic_exec_single(int cpu, struct 
call_single_data *csd,
return 0;
 }
 
-/*
- * Invoked by arch to handle an IPI for call function single. Must be
- * called from the arch with interrupts disabled.
+/**
+ * generic_smp_call_function_single_interrupt - Execute SMP IPI callbacks
+ *
+ * Invoked by arch to handle an IPI for call function single.
+ * Must be called with interrupts disabled.
  */
 void generic_smp_call_function_single_interrupt(void)
 {
+   flush_smp_call_function_queue(true);
+}
+
+/**
+ * flush_smp_call_function_queue - Flush pending smp-call-function callbacks
+ *
+ * @warn_cpu_offline: If set to 'true', warn if callbacks were queued on an
+ *   offline CPU. Skip this check if set to 'false'.
+ *
+ * Flush any pending smp-call-function callbacks queued on this CPU. This is
+ * invoked by the generic IPI handler, as well as by a CPU about to go offline,
+ * to ensure that all pending IPI callbacks are run before it goes completely
+ * offline.
+ *
+ * Loop through the call_single_queue and run all the queued callbacks.
+ * Must be called with interrupts disabled.
+ */
+static void flush_smp_call_function_queue(bool warn_cpu_offline)
+{
+   struct llist_head *head;
struct llist_node *entry;
struct call_single_data *csd, *csd_next;
static bool warned;
 
-   entry = llist_del_all(__get_cpu_var(call_single_queue));
+   WARN_ON(!irqs_disabled());
+
+   head = __get_cpu_var(call_single_queue);
+   entry = llist_del_all(head);
entry = llist_reverse_order(entry);
 
-   /*
-* Shouldn't receive this interrupt on a cpu that is not yet online.
-*/
-   if (unlikely(!cpu_online(smp_processor_id())  !warned)) {
+   /* There shouldn't be any pending callbacks on an offline CPU. */
+   if (unlikely(warn_cpu_offline  !cpu_online(smp_processor_id()) 
+!warned  !llist_empty(head))) {
warned = true;
WARN(1, IPI on offline CPU %d\n, smp_processor_id());
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >