Re: [PATCH 3/3] tracing/hwlat: Fix a few trivial nits
On 10/14/19 12:24 PM, Steven Rostedt wrote: > On Thu, 10 Oct 2019 11:51:17 -0700 > "Srivatsa S. Bhat" wrote: > >> From: Srivatsa S. Bhat (VMware) >> >> Update the source file name in the comments, and fix a grammatical >> error. > > Patch 1 and 2 have already been applied to Linus's tree. > > I've queued this one up for the next merge window. > > Thanks! > Thanks a lot Steve! Regards, Srivatsa
[PATCH 3/3] tracing/hwlat: Fix a few trivial nits
From: Srivatsa S. Bhat (VMware) Update the source file name in the comments, and fix a grammatical error. Signed-off-by: Srivatsa S. Bhat (VMware) --- kernel/trace/trace_hwlat.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c index 862f4b0..941cb82 100644 --- a/kernel/trace/trace_hwlat.c +++ b/kernel/trace/trace_hwlat.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 /* - * trace_hwlatdetect.c - A simple Hardware Latency detector. + * trace_hwlat.c - A simple Hardware Latency detector. * * Use this tracer to detect large system latencies induced by the behavior of * certain underlying system hardware or firmware, independent of Linux itself. @@ -276,7 +276,7 @@ static void move_to_next_cpu(void) return; /* * If for some reason the user modifies the CPU affinity -* of this thread, than stop migrating for the duration +* of this thread, then stop migrating for the duration * of the current test. */ if (!cpumask_equal(current_mask, current->cpus_ptr))
[PATCH 2/3] tracing/hwlat: Don't ignore outer-loop duration when calculating max_latency
From: Srivatsa S. Bhat (VMware) max_latency is intended to record the maximum ever observed hardware latency, which may occur in either part of the loop (inner/outer). So we need to also consider the outer-loop sample when updating max_latency. Fixes: e7c15cd8a113 ("tracing: Added hardware latency tracer") Cc: sta...@vger.kernel.org Signed-off-by: Srivatsa S. Bhat (VMware) --- kernel/trace/trace_hwlat.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c index a0251a7..862f4b0 100644 --- a/kernel/trace/trace_hwlat.c +++ b/kernel/trace/trace_hwlat.c @@ -256,6 +256,8 @@ static int get_sample(void) /* Keep a running maximum ever recorded hardware latency */ if (sample > tr->max_latency) tr->max_latency = sample; + if (outer_sample > tr->max_latency) + tr->max_latency = outer_sample; } out:
[PATCH 1/3] tracing/hwlat: Report total time spent in all NMIs during the sample
From: Srivatsa S. Bhat (VMware) nmi_total_ts is supposed to record the total time spent in *all* NMIs that occur on the given CPU during the (active portion of the) sampling window. However, the code seems to be overwriting this variable for each NMI, thereby only recording the time spent in the most recent NMI. Fix it by accumulating the duration instead. Fixes: 7b2c86250122 ("tracing: Add NMI tracing in hwlat detector") Cc: sta...@vger.kernel.org Signed-off-by: Srivatsa S. Bhat (VMware) --- kernel/trace/trace_hwlat.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c index fa95139..a0251a7 100644 --- a/kernel/trace/trace_hwlat.c +++ b/kernel/trace/trace_hwlat.c @@ -150,7 +150,7 @@ void trace_hwlat_callback(bool enter) if (enter) nmi_ts_start = time_get(); else - nmi_total_ts = time_get() - nmi_ts_start; + nmi_total_ts += time_get() - nmi_ts_start; } if (enter)
Re: [PATCH BUGFIX IMPROVEMENT 0/7] boost throughput with synced I/O, reduce latency and fix a bandwidth bug
On 6/24/19 12:40 PM, Paolo Valente wrote: > Hi Jens, > this series, based against for-5.3/block, contains: > 1) The improvements to recover the throughput loss reported by >Srivatsa [1] (first five patches) > 2) A preemption improvement to reduce I/O latency > 3) A fix of a subtle bug causing loss of control over I/O bandwidths > Thanks a lot for these patches, Paolo! Would you mind adding: Reported-by: Srivatsa S. Bhat (VMware) Tested-by: Srivatsa S. Bhat (VMware) to the first 5 patches, as appropriate? Thank you! > > [1] https://lkml.org/lkml/2019/5/17/755 > > Paolo Valente (7): > block, bfq: reset inject limit when think-time state changes > block, bfq: fix rq_in_driver check in bfq_update_inject_limit > block, bfq: update base request service times when possible > block, bfq: bring forward seek time update > block, bfq: detect wakers and unconditionally inject their I/O > block, bfq: preempt lower-weight or lower-priority queues > block, bfq: re-schedule empty queues if they deserve I/O plugging > > block/bfq-iosched.c | 952 ++-- > block/bfq-iosched.h | 25 +- > 2 files changed, 686 insertions(+), 291 deletions(-) > Regards, Srivatsa VMware Photon OS
Re: CFQ idling kills I/O performance on ext4 with blkio cgroup controller
On 5/22/19 2:12 AM, Paolo Valente wrote: > >> Il giorno 22 mag 2019, alle ore 11:02, Srivatsa S. Bhat >> ha scritto: >> >> >> Let's continue here on LKML itself. > > Just done :) > >> The only reason I created the >> bugzilla entry is to attach the tarball of the traces, assuming >> that it would allow me to upload a 20 MB file (since email attachment >> didn't work). But bugzilla's file restriction is much smaller than >> that, so it didn't work out either, and I resorted to using dropbox. >> So we don't need the bugzilla entry anymore; I might as well close it >> to avoid confusion. >> > > No no, don't close it: it can reach people that don't use LKML. We > just have to remember to report back at the end of this. Ah, good point! > BTW, I also > think that the bug is incorrectly filed against 5.1, while all these > tests and results concern 5.2-rcX. > Fixed now, thank you for pointing out! Regards, Srivatsa VMware Photon OS
Re: CFQ idling kills I/O performance on ext4 with blkio cgroup controller
On 5/22/19 2:09 AM, Paolo Valente wrote: > > First, thank you very much for testing my patches, and, above all, for > sharing those huge traces! > > According to the your traces, the residual 20% lower throughput that you > record is due to the fact that the BFQ injection mechanism takes a few > hundredths of seconds to stabilize, at the beginning of the workload. > During that setup time, the throughput is equal to the dreadful ~60-90 KB/s > that you see without this new patch. After that time, there > seems to be no loss according to the trace. > > The problem is that a loss lasting only a few hundredths of seconds is > however not negligible for a write workload that lasts only 3-4 > seconds. Could you please try writing a larger file? > I tried running dd for longer (about 100 seconds), but still saw around 1.4 MB/s throughput with BFQ, and between 1.5 MB/s - 1.6 MB/s with mq-deadline and noop. But I'm not too worried about that difference. > In addition, I wanted to ask you whether you measured BFQ throughput > with traces disabled. This may make a difference. > The above result (1.4 MB/s) was obtained with traces disabled. > After trying writing a larger file, you can try with low_latency on. > On my side, it causes results to become a little unstable across > repetitions (which is expected). > With low_latency on, I get between 60 KB/s - 100 KB/s. Regards, Srivatsa VMware Photon OS
[PATCH] tracing: Fix documentation about disabling options using trace_options
From: Srivatsa S. Bhat (VMware) To disable a tracing option using the trace_options file, the option name needs to be prefixed with 'no', and not suffixed, as the README states. Fix it. Signed-off-by: Srivatsa S. Bhat (VMware) --- kernel/trace/trace.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index c521b73..d632458 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -4541,7 +4541,7 @@ static const char readme_msg[] = " instances\t\t- Make sub-buffers with: mkdir instances/foo\n" "\t\t\t Remove sub-buffer with rmdir\n" " trace_options\t\t- Set format or modify how tracing happens\n" - "\t\t\t Disable an option by adding a suffix 'no' to the\n" + "\t\t\t Disable an option by prefixing 'no' to the\n" "\t\t\t option name\n" " saved_cmdlines_size\t- echo command number in here to store comm-pid list\n" #ifdef CONFIG_DYNAMIC_FTRACE
[PATCH 4.4.y 046/101] x86/speculation: Move firmware_restrict_branch_speculation_*() from C to CPP
From: Ingo Molnar commit d72f4e29e6d84b7ec02ae93088aa459ac70e733b upstream. firmware_restrict_branch_speculation_*() recently started using preempt_enable()/disable(), but those are relatively high level primitives and cause build failures on some 32-bit builds. Since we want to keep low level, convert them to macros to avoid header hell... Cc: David Woodhouse Cc: Thomas Gleixner Cc: Linus Torvalds Cc: Peter Zijlstra Cc: arjan.van.de@intel.com Cc: b...@alien8.de Cc: dave.han...@intel.com Cc: jmatt...@google.com Cc: karah...@amazon.de Cc: k...@vger.kernel.org Cc: pbonz...@redhat.com Cc: rkrc...@redhat.com Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman Signed-off-by: Srivatsa S. Bhat Reviewed-by: Matt Helsley (VMware) Reviewed-by: Alexey Makhalov Reviewed-by: Bo Gan --- arch/x86/include/asm/nospec-branch.h | 26 ++ 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index 36ded24..b9dd1d9 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -214,20 +214,22 @@ static inline void indirect_branch_prediction_barrier(void) /* * With retpoline, we must use IBRS to restrict branch prediction * before calling into firmware. + * + * (Implemented as CPP macros due to header hell.) */ -static inline void firmware_restrict_branch_speculation_start(void) -{ - preempt_disable(); - alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS, - X86_FEATURE_USE_IBRS_FW); -} +#define firmware_restrict_branch_speculation_start() \ +do { \ + preempt_disable(); \ + alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS, \ + X86_FEATURE_USE_IBRS_FW); \ +} while (0) -static inline void firmware_restrict_branch_speculation_end(void) -{ - alternative_msr_write(MSR_IA32_SPEC_CTRL, 0, - X86_FEATURE_USE_IBRS_FW); - preempt_enable(); -} +#define firmware_restrict_branch_speculation_end() \ +do { \ + alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,\ + X86_FEATURE_USE_IBRS_FW); \ + preempt_enable(); \ +} while (0) #endif /* __ASSEMBLY__ */
[PATCH 4.4.y 046/101] x86/speculation: Move firmware_restrict_branch_speculation_*() from C to CPP
From: Ingo Molnar commit d72f4e29e6d84b7ec02ae93088aa459ac70e733b upstream. firmware_restrict_branch_speculation_*() recently started using preempt_enable()/disable(), but those are relatively high level primitives and cause build failures on some 32-bit builds. Since we want to keep low level, convert them to macros to avoid header hell... Cc: David Woodhouse Cc: Thomas Gleixner Cc: Linus Torvalds Cc: Peter Zijlstra Cc: arjan.van.de@intel.com Cc: b...@alien8.de Cc: dave.han...@intel.com Cc: jmatt...@google.com Cc: karah...@amazon.de Cc: k...@vger.kernel.org Cc: pbonz...@redhat.com Cc: rkrc...@redhat.com Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman Signed-off-by: Srivatsa S. Bhat Reviewed-by: Matt Helsley (VMware) Reviewed-by: Alexey Makhalov Reviewed-by: Bo Gan --- arch/x86/include/asm/nospec-branch.h | 26 ++ 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index 36ded24..b9dd1d9 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -214,20 +214,22 @@ static inline void indirect_branch_prediction_barrier(void) /* * With retpoline, we must use IBRS to restrict branch prediction * before calling into firmware. + * + * (Implemented as CPP macros due to header hell.) */ -static inline void firmware_restrict_branch_speculation_start(void) -{ - preempt_disable(); - alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS, - X86_FEATURE_USE_IBRS_FW); -} +#define firmware_restrict_branch_speculation_start() \ +do { \ + preempt_disable(); \ + alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS, \ + X86_FEATURE_USE_IBRS_FW); \ +} while (0) -static inline void firmware_restrict_branch_speculation_end(void) -{ - alternative_msr_write(MSR_IA32_SPEC_CTRL, 0, - X86_FEATURE_USE_IBRS_FW); - preempt_enable(); -} +#define firmware_restrict_branch_speculation_end() \ +do { \ + alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,\ + X86_FEATURE_USE_IBRS_FW); \ + preempt_enable(); \ +} while (0) #endif /* __ASSEMBLY__ */
[PATCH 4.4.y 033/101] x86/asm/entry/32: Simplify pushes of zeroed pt_regs->REGs
From: Denys Vlasenko commit 778843f934e362ed4ed734520f60a44a78a074b4 upstream Use of a temporary R8 register here seems to be unnecessary. "push %r8" is a two-byte insn (it needs REX prefix to specify R8), "push $0" is two-byte too. It seems just using the latter would be no worse. Thus, code had an unnecessary "xorq %r8,%r8" insn. It probably costs nothing in execution time here since we are probably limited by store bandwidth at this point, but still. Run-tested under QEMU: 32-bit calls still work: / # ./test_syscall_vdso32 [RUN] Executing 6-argument 32-bit syscall via VDSO [OK] Arguments are preserved across syscall [NOTE] R11 has changed:00200ed7 - assuming clobbered by SYSRET insn [OK] R8..R15 did not leak kernel data [RUN] Executing 6-argument 32-bit syscall via INT 80 [OK] Arguments are preserved across syscall [OK] R8..R15 did not leak kernel data [RUN] Running tests under ptrace [RUN] Executing 6-argument 32-bit syscall via VDSO [OK] Arguments are preserved across syscall [NOTE] R11 has changed:00200ed7 - assuming clobbered by SYSRET insn [OK] R8..R15 did not leak kernel data [RUN] Executing 6-argument 32-bit syscall via INT 80 [OK] Arguments are preserved across syscall [OK] R8..R15 did not leak kernel data Signed-off-by: Denys Vlasenko Acked-by: Andy Lutomirski Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Frederic Weisbecker Cc: H. Peter Anvin Cc: Kees Cook Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Steven Rostedt Cc: Thomas Gleixner Cc: Will Drewry Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1462201010-16846-1-git-send-email-dvlas...@redhat.com Signed-off-by: Ingo Molnar Signed-off-by: Srivatsa S. Bhat Reviewed-by: Matt Helsley (VMware) Reviewed-by: Alexey Makhalov Reviewed-by: Bo Gan --- arch/x86/entry/entry_64_compat.S | 45 ++ 1 file changed, 21 insertions(+), 24 deletions(-) diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S index d03bf0e..e479ff8 100644 --- a/arch/x86/entry/entry_64_compat.S +++ b/arch/x86/entry/entry_64_compat.S @@ -79,24 +79,23 @@ ENTRY(entry_SYSENTER_compat) ASM_CLAC/* Clear AC after saving FLAGS */ pushq $__USER32_CS/* pt_regs->cs */ - xorq%r8,%r8 - pushq %r8 /* pt_regs->ip = 0 (placeholder) */ + pushq $0 /* pt_regs->ip = 0 (placeholder) */ pushq %rax/* pt_regs->orig_ax */ pushq %rdi/* pt_regs->di */ pushq %rsi/* pt_regs->si */ pushq %rdx/* pt_regs->dx */ pushq %rcx/* pt_regs->cx */ pushq $-ENOSYS/* pt_regs->ax */ - pushq %r8 /* pt_regs->r8 = 0 */ - pushq %r8 /* pt_regs->r9 = 0 */ - pushq %r8 /* pt_regs->r10 = 0 */ - pushq %r8 /* pt_regs->r11 = 0 */ + pushq $0 /* pt_regs->r8 = 0 */ + pushq $0 /* pt_regs->r9 = 0 */ + pushq $0 /* pt_regs->r10 = 0 */ + pushq $0 /* pt_regs->r11 = 0 */ pushq %rbx/* pt_regs->rbx */ pushq %rbp/* pt_regs->rbp (will be overwritten) */ - pushq %r8 /* pt_regs->r12 = 0 */ - pushq %r8 /* pt_regs->r13 = 0 */ - pushq %r8 /* pt_regs->r14 = 0 */ - pushq %r8 /* pt_regs->r15 = 0 */ + pushq $0 /* pt_regs->r12 = 0 */ + pushq $0 /* pt_regs->r13 = 0 */ + pushq $0 /* pt_regs->r14 = 0 */ + pushq $0 /* pt_regs->r15 = 0 */ cld /* @@ -185,17 +184,16 @@ ENTRY(entry_SYSCALL_compat) pushq %rdx/* pt_regs->dx */ pushq %rbp/* pt_regs->cx (stashed in bp) */ pushq $-ENOSYS/* pt_regs->ax */ - xorq%r8,%r8 - pushq %r8 /* pt_regs->r8 = 0 */ - pushq %r8 /* pt_regs->r9 = 0 */ - pushq %r8 /* pt_regs->r10 = 0 */ - pushq %r8 /* pt_regs->r11 = 0 */ + pushq $0 /* pt_regs->r8 = 0 */ + pushq $0 /* pt_regs->r9 = 0 */ + pushq $0 /* pt_regs->r10 = 0 */ + pushq $0 /* pt_regs->r11 = 0 */ pushq
[PATCH 4.4.y 037/101] x86/speculation: Clean up various Spectre related details
From: Ingo Molnar commit 21e433bdb95bdf3aa48226fd3d33af608437f293 upstream. Harmonize all the Spectre messages so that a: dmesg | grep -i spectre ... gives us most Spectre related kernel boot messages. Also fix a few other details: - clarify a comment about firmware speculation control - s/KPTI/PTI - remove various line-breaks that made the code uglier Acked-by: David Woodhouse Cc: Andy Lutomirski Cc: Arjan van de Ven Cc: Borislav Petkov Cc: Dan Williams Cc: Dave Hansen Cc: David Woodhouse Cc: Greg Kroah-Hartman Cc: Josh Poimboeuf Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman Signed-off-by: Srivatsa S. Bhat Reviewed-by: Matt Helsley (VMware) Reviewed-by: Alexey Makhalov Reviewed-by: Bo Gan --- arch/x86/kernel/cpu/bugs.c | 25 ++--- 1 file changed, 10 insertions(+), 15 deletions(-) diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 1968baf..fea368d 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -162,8 +162,7 @@ static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void) if (cmdline_find_option_bool(boot_command_line, "nospectre_v2")) return SPECTRE_V2_CMD_NONE; else { - ret = cmdline_find_option(boot_command_line, "spectre_v2", arg, - sizeof(arg)); + ret = cmdline_find_option(boot_command_line, "spectre_v2", arg, sizeof(arg)); if (ret < 0) return SPECTRE_V2_CMD_AUTO; @@ -184,8 +183,7 @@ static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void) cmd == SPECTRE_V2_CMD_RETPOLINE_AMD || cmd == SPECTRE_V2_CMD_RETPOLINE_GENERIC) && !IS_ENABLED(CONFIG_RETPOLINE)) { - pr_err("%s selected but not compiled in. Switching to AUTO select\n", - mitigation_options[i].option); + pr_err("%s selected but not compiled in. Switching to AUTO select\n", mitigation_options[i].option); return SPECTRE_V2_CMD_AUTO; } @@ -255,14 +253,14 @@ static void __init spectre_v2_select_mitigation(void) goto retpoline_auto; break; } - pr_err("kernel not compiled with retpoline; no mitigation available!"); + pr_err("Spectre mitigation: kernel not compiled with retpoline; no mitigation available!"); return; retpoline_auto: if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) { retpoline_amd: if (!boot_cpu_has(X86_FEATURE_LFENCE_RDTSC)) { - pr_err("LFENCE not serializing. Switching to generic retpoline\n"); + pr_err("Spectre mitigation: LFENCE not serializing, switching to generic retpoline\n"); goto retpoline_generic; } mode = retp_compiler() ? SPECTRE_V2_RETPOLINE_AMD : @@ -280,7 +278,7 @@ retpoline_auto: pr_info("%s\n", spectre_v2_strings[mode]); /* -* If neither SMEP or KPTI are available, there is a risk of +* If neither SMEP nor PTI are available, there is a risk of * hitting userspace addresses in the RSB after a context switch * from a shallow call stack to a deeper one. To prevent this fill * the entire RSB, even when using IBRS. @@ -294,21 +292,20 @@ retpoline_auto: if ((!boot_cpu_has(X86_FEATURE_KAISER) && !boot_cpu_has(X86_FEATURE_SMEP)) || is_skylake_era()) { setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW); - pr_info("Filling RSB on context switch\n"); + pr_info("Spectre v2 mitigation: Filling RSB on context switch\n"); } /* Initialize Indirect Branch Prediction Barrier if supported */ if (boot_cpu_has(X86_FEATURE_IBPB)) { setup_force_cpu_cap(X86_FEATURE_USE_IBPB); - pr_info("Enabling Indirect Branch Prediction Barrier\n"); + pr_info("Spectre v2 mitigation: Enabling Indirect Branch Prediction Barrier\n"); } } #undef pr_fmt #ifdef CONFIG_SYSFS -ssize_t cpu_show_meltdown(struct device *dev, - struct device_attribute *attr, char *buf) +ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, char *buf) { if (!boot_cpu_has_bug(X86_BUG_CPU_MELTDOWN)) return sprintf(buf, "Not affected\n"); @@ -317,16 +314,14 @@ ssize_t cpu_show_meltdown(struct device *dev, return sprintf(buf, "Vulnerable\n"); } -ssize_t cpu_show_spectre_v1(struct device *dev, - struct
[PATCH 4.4.y 033/101] x86/asm/entry/32: Simplify pushes of zeroed pt_regs->REGs
From: Denys Vlasenko commit 778843f934e362ed4ed734520f60a44a78a074b4 upstream Use of a temporary R8 register here seems to be unnecessary. "push %r8" is a two-byte insn (it needs REX prefix to specify R8), "push $0" is two-byte too. It seems just using the latter would be no worse. Thus, code had an unnecessary "xorq %r8,%r8" insn. It probably costs nothing in execution time here since we are probably limited by store bandwidth at this point, but still. Run-tested under QEMU: 32-bit calls still work: / # ./test_syscall_vdso32 [RUN] Executing 6-argument 32-bit syscall via VDSO [OK] Arguments are preserved across syscall [NOTE] R11 has changed:00200ed7 - assuming clobbered by SYSRET insn [OK] R8..R15 did not leak kernel data [RUN] Executing 6-argument 32-bit syscall via INT 80 [OK] Arguments are preserved across syscall [OK] R8..R15 did not leak kernel data [RUN] Running tests under ptrace [RUN] Executing 6-argument 32-bit syscall via VDSO [OK] Arguments are preserved across syscall [NOTE] R11 has changed:00200ed7 - assuming clobbered by SYSRET insn [OK] R8..R15 did not leak kernel data [RUN] Executing 6-argument 32-bit syscall via INT 80 [OK] Arguments are preserved across syscall [OK] R8..R15 did not leak kernel data Signed-off-by: Denys Vlasenko Acked-by: Andy Lutomirski Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Frederic Weisbecker Cc: H. Peter Anvin Cc: Kees Cook Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Steven Rostedt Cc: Thomas Gleixner Cc: Will Drewry Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1462201010-16846-1-git-send-email-dvlas...@redhat.com Signed-off-by: Ingo Molnar Signed-off-by: Srivatsa S. Bhat Reviewed-by: Matt Helsley (VMware) Reviewed-by: Alexey Makhalov Reviewed-by: Bo Gan --- arch/x86/entry/entry_64_compat.S | 45 ++ 1 file changed, 21 insertions(+), 24 deletions(-) diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S index d03bf0e..e479ff8 100644 --- a/arch/x86/entry/entry_64_compat.S +++ b/arch/x86/entry/entry_64_compat.S @@ -79,24 +79,23 @@ ENTRY(entry_SYSENTER_compat) ASM_CLAC/* Clear AC after saving FLAGS */ pushq $__USER32_CS/* pt_regs->cs */ - xorq%r8,%r8 - pushq %r8 /* pt_regs->ip = 0 (placeholder) */ + pushq $0 /* pt_regs->ip = 0 (placeholder) */ pushq %rax/* pt_regs->orig_ax */ pushq %rdi/* pt_regs->di */ pushq %rsi/* pt_regs->si */ pushq %rdx/* pt_regs->dx */ pushq %rcx/* pt_regs->cx */ pushq $-ENOSYS/* pt_regs->ax */ - pushq %r8 /* pt_regs->r8 = 0 */ - pushq %r8 /* pt_regs->r9 = 0 */ - pushq %r8 /* pt_regs->r10 = 0 */ - pushq %r8 /* pt_regs->r11 = 0 */ + pushq $0 /* pt_regs->r8 = 0 */ + pushq $0 /* pt_regs->r9 = 0 */ + pushq $0 /* pt_regs->r10 = 0 */ + pushq $0 /* pt_regs->r11 = 0 */ pushq %rbx/* pt_regs->rbx */ pushq %rbp/* pt_regs->rbp (will be overwritten) */ - pushq %r8 /* pt_regs->r12 = 0 */ - pushq %r8 /* pt_regs->r13 = 0 */ - pushq %r8 /* pt_regs->r14 = 0 */ - pushq %r8 /* pt_regs->r15 = 0 */ + pushq $0 /* pt_regs->r12 = 0 */ + pushq $0 /* pt_regs->r13 = 0 */ + pushq $0 /* pt_regs->r14 = 0 */ + pushq $0 /* pt_regs->r15 = 0 */ cld /* @@ -185,17 +184,16 @@ ENTRY(entry_SYSCALL_compat) pushq %rdx/* pt_regs->dx */ pushq %rbp/* pt_regs->cx (stashed in bp) */ pushq $-ENOSYS/* pt_regs->ax */ - xorq%r8,%r8 - pushq %r8 /* pt_regs->r8 = 0 */ - pushq %r8 /* pt_regs->r9 = 0 */ - pushq %r8 /* pt_regs->r10 = 0 */ - pushq %r8 /* pt_regs->r11 = 0 */ + pushq $0 /* pt_regs->r8 = 0 */ + pushq $0 /* pt_regs->r9 = 0 */ + pushq $0 /* pt_regs->r10 = 0 */ + pushq $0 /* pt_regs->r11 = 0 */ pushq
[PATCH 4.4.y 037/101] x86/speculation: Clean up various Spectre related details
From: Ingo Molnar commit 21e433bdb95bdf3aa48226fd3d33af608437f293 upstream. Harmonize all the Spectre messages so that a: dmesg | grep -i spectre ... gives us most Spectre related kernel boot messages. Also fix a few other details: - clarify a comment about firmware speculation control - s/KPTI/PTI - remove various line-breaks that made the code uglier Acked-by: David Woodhouse Cc: Andy Lutomirski Cc: Arjan van de Ven Cc: Borislav Petkov Cc: Dan Williams Cc: Dave Hansen Cc: David Woodhouse Cc: Greg Kroah-Hartman Cc: Josh Poimboeuf Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman Signed-off-by: Srivatsa S. Bhat Reviewed-by: Matt Helsley (VMware) Reviewed-by: Alexey Makhalov Reviewed-by: Bo Gan --- arch/x86/kernel/cpu/bugs.c | 25 ++--- 1 file changed, 10 insertions(+), 15 deletions(-) diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 1968baf..fea368d 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -162,8 +162,7 @@ static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void) if (cmdline_find_option_bool(boot_command_line, "nospectre_v2")) return SPECTRE_V2_CMD_NONE; else { - ret = cmdline_find_option(boot_command_line, "spectre_v2", arg, - sizeof(arg)); + ret = cmdline_find_option(boot_command_line, "spectre_v2", arg, sizeof(arg)); if (ret < 0) return SPECTRE_V2_CMD_AUTO; @@ -184,8 +183,7 @@ static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void) cmd == SPECTRE_V2_CMD_RETPOLINE_AMD || cmd == SPECTRE_V2_CMD_RETPOLINE_GENERIC) && !IS_ENABLED(CONFIG_RETPOLINE)) { - pr_err("%s selected but not compiled in. Switching to AUTO select\n", - mitigation_options[i].option); + pr_err("%s selected but not compiled in. Switching to AUTO select\n", mitigation_options[i].option); return SPECTRE_V2_CMD_AUTO; } @@ -255,14 +253,14 @@ static void __init spectre_v2_select_mitigation(void) goto retpoline_auto; break; } - pr_err("kernel not compiled with retpoline; no mitigation available!"); + pr_err("Spectre mitigation: kernel not compiled with retpoline; no mitigation available!"); return; retpoline_auto: if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) { retpoline_amd: if (!boot_cpu_has(X86_FEATURE_LFENCE_RDTSC)) { - pr_err("LFENCE not serializing. Switching to generic retpoline\n"); + pr_err("Spectre mitigation: LFENCE not serializing, switching to generic retpoline\n"); goto retpoline_generic; } mode = retp_compiler() ? SPECTRE_V2_RETPOLINE_AMD : @@ -280,7 +278,7 @@ retpoline_auto: pr_info("%s\n", spectre_v2_strings[mode]); /* -* If neither SMEP or KPTI are available, there is a risk of +* If neither SMEP nor PTI are available, there is a risk of * hitting userspace addresses in the RSB after a context switch * from a shallow call stack to a deeper one. To prevent this fill * the entire RSB, even when using IBRS. @@ -294,21 +292,20 @@ retpoline_auto: if ((!boot_cpu_has(X86_FEATURE_KAISER) && !boot_cpu_has(X86_FEATURE_SMEP)) || is_skylake_era()) { setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW); - pr_info("Filling RSB on context switch\n"); + pr_info("Spectre v2 mitigation: Filling RSB on context switch\n"); } /* Initialize Indirect Branch Prediction Barrier if supported */ if (boot_cpu_has(X86_FEATURE_IBPB)) { setup_force_cpu_cap(X86_FEATURE_USE_IBPB); - pr_info("Enabling Indirect Branch Prediction Barrier\n"); + pr_info("Spectre v2 mitigation: Enabling Indirect Branch Prediction Barrier\n"); } } #undef pr_fmt #ifdef CONFIG_SYSFS -ssize_t cpu_show_meltdown(struct device *dev, - struct device_attribute *attr, char *buf) +ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, char *buf) { if (!boot_cpu_has_bug(X86_BUG_CPU_MELTDOWN)) return sprintf(buf, "Not affected\n"); @@ -317,16 +314,14 @@ ssize_t cpu_show_meltdown(struct device *dev, return sprintf(buf, "Vulnerable\n"); } -ssize_t cpu_show_spectre_v1(struct device *dev, - struct
Re: [PATCH 1/5] random: fix crng_ready() test
On 4/13/18 10:00 AM, Theodore Y. Ts'o wrote: > On Fri, Apr 13, 2018 at 03:05:01PM +0200, Stephan Mueller wrote: >> >> What I would like to point out that more and more folks change to >> getrandom(2). As this call will now unblock much later in the boot cycle, >> these systems see a significant departure from the current system behavior. >> >> E.g. an sshd using getrandom(2) would be ready shortly after the boot >> finishes >> as of now. Now it can be a matter minutes before it responds. Thus, is such >> change in the kernel behavior something for stable? [...] > I was a little worried that on VM's this could end up causing things > to block for a long time, but an experiment on a GCE VM shows that > isn't a problem: > > [0.00] Linux version 4.16.0-rc3-ext4-9-gf6b302ebca85 (tytso@cwcc) > (gcc version 7.3.0 (Debian 7.3.0-15)) #16 SMP Thu Apr 12 16:57:17 EDT 2018 > [1.282220] random: fast init done > [3.987092] random: crng init done > [4.376787] EXT4-fs (sda1): re-mounted. Opts: (null) > > There are some desktops where the "crng_init done" report doesn't > happen until 45-90 seconds into the boot. I don't think I've seen > reports where it takes _minutes_ however. Can you give me some > examples of such cases? On a Photon OS VM running on VMware ESXi, this patch causes a boot speed regression of 5 minutes :-( [ The VM doesn't have haveged or rng-tools (rngd) installed. ] [1.420246] EXT4-fs (sda2): re-mounted. Opts: barrier,noacl,data=ordered [1.469722] tsc: Refined TSC clocksource calibration: 1900.002 MHz [1.470707] clocksource: tsc: mask: 0x max_cycles: 0x36c65c1a9e1, max_idle_ns: 881590695311 ns [1.474249] clocksource: Switched to clocksource tsc [1.584427] systemd-journald[216]: Received request to flush runtime journal from PID 1 [ 346.620718] random: crng init done Interestingly, the boot delay is exacerbated on VMs with large amounts of RAM. For example, the delay is not so noticeable (< 30 seconds) on a VM with 2GB memory, but goes up to 5 minutes on an 8GB VM. Also, cloud-init-local.service seems to be the one blocking for entropy here. systemd-analyze critical-chain shows: The time after the unit is active or started is printed after the "@" character. The time the unit takes to start is printed after the "+" character. multi-user.target @6min 1.283s └─vmtoolsd.service @6min 1.282s └─cloud-final.service @6min 366ms +914ms └─cloud-config.service @5min 59.174s +1.190s └─cloud-config.target @5min 59.172s └─cloud-init.service @5min 47.423s +11.744s └─systemd-networkd-wait-online.service @5min 45.999s +1.420s └─systemd-networkd.service @5min 45.975s +21ms └─network-pre.target @5min 45.973s └─cloud-init-local.service @241ms +5min 45.687s └─systemd-remount-fs.service @222ms +13ms └─systemd-fsck-root.service @193ms +26ms └─systemd-journald.socket @188ms └─-.mount @151ms └─system.slice @161ms └─-.slice @151ms It would be great if this CVE can be fixed somehow without causing boot speed to spike from ~20 seconds to 5 minutes, as that makes the system pretty much unusable. I can workaround this by installing haveged, but ideally an in-kernel fix would be better. If you need any other info about my setup or if you have a patch that I can test, please let me know! Thank you very much! Regards, Srivatsa VMware Photon OS
Re: [PATCH 1/5] random: fix crng_ready() test
On 4/13/18 10:00 AM, Theodore Y. Ts'o wrote: > On Fri, Apr 13, 2018 at 03:05:01PM +0200, Stephan Mueller wrote: >> >> What I would like to point out that more and more folks change to >> getrandom(2). As this call will now unblock much later in the boot cycle, >> these systems see a significant departure from the current system behavior. >> >> E.g. an sshd using getrandom(2) would be ready shortly after the boot >> finishes >> as of now. Now it can be a matter minutes before it responds. Thus, is such >> change in the kernel behavior something for stable? [...] > I was a little worried that on VM's this could end up causing things > to block for a long time, but an experiment on a GCE VM shows that > isn't a problem: > > [0.00] Linux version 4.16.0-rc3-ext4-9-gf6b302ebca85 (tytso@cwcc) > (gcc version 7.3.0 (Debian 7.3.0-15)) #16 SMP Thu Apr 12 16:57:17 EDT 2018 > [1.282220] random: fast init done > [3.987092] random: crng init done > [4.376787] EXT4-fs (sda1): re-mounted. Opts: (null) > > There are some desktops where the "crng_init done" report doesn't > happen until 45-90 seconds into the boot. I don't think I've seen > reports where it takes _minutes_ however. Can you give me some > examples of such cases? On a Photon OS VM running on VMware ESXi, this patch causes a boot speed regression of 5 minutes :-( [ The VM doesn't have haveged or rng-tools (rngd) installed. ] [1.420246] EXT4-fs (sda2): re-mounted. Opts: barrier,noacl,data=ordered [1.469722] tsc: Refined TSC clocksource calibration: 1900.002 MHz [1.470707] clocksource: tsc: mask: 0x max_cycles: 0x36c65c1a9e1, max_idle_ns: 881590695311 ns [1.474249] clocksource: Switched to clocksource tsc [1.584427] systemd-journald[216]: Received request to flush runtime journal from PID 1 [ 346.620718] random: crng init done Interestingly, the boot delay is exacerbated on VMs with large amounts of RAM. For example, the delay is not so noticeable (< 30 seconds) on a VM with 2GB memory, but goes up to 5 minutes on an 8GB VM. Also, cloud-init-local.service seems to be the one blocking for entropy here. systemd-analyze critical-chain shows: The time after the unit is active or started is printed after the "@" character. The time the unit takes to start is printed after the "+" character. multi-user.target @6min 1.283s └─vmtoolsd.service @6min 1.282s └─cloud-final.service @6min 366ms +914ms └─cloud-config.service @5min 59.174s +1.190s └─cloud-config.target @5min 59.172s └─cloud-init.service @5min 47.423s +11.744s └─systemd-networkd-wait-online.service @5min 45.999s +1.420s └─systemd-networkd.service @5min 45.975s +21ms └─network-pre.target @5min 45.973s └─cloud-init-local.service @241ms +5min 45.687s └─systemd-remount-fs.service @222ms +13ms └─systemd-fsck-root.service @193ms +26ms └─systemd-journald.socket @188ms └─-.mount @151ms └─system.slice @161ms └─-.slice @151ms It would be great if this CVE can be fixed somehow without causing boot speed to spike from ~20 seconds to 5 minutes, as that makes the system pretty much unusable. I can workaround this by installing haveged, but ideally an in-kernel fix would be better. If you need any other info about my setup or if you have a patch that I can test, please let me know! Thank you very much! Regards, Srivatsa VMware Photon OS
Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed
On 3/21/18 10:12 PM, Srivatsa S. Bhat wrote: > On 3/21/18 7:02 PM, Steve French wrote: >> Found a patch which solves the dependency issue. In my testing (on >> 4.9, with Windows 2016, and also to Samba) as Pavel suggested this >> appears to fix the problem, but I will let Srivatsa confirm that it >> also fixes it for him. The two attached patches for 4.9 should work. >> > > Indeed, those two patches fix the problem for me on 4.9. Thanks a lot > Steve, Pavel and Aurelien for all your efforts in fixing this! > > I was also interested in getting this fixed on 4.4, so I modified the > patches to apply on 4.4.88 and verified that they fix the mount I meant to say 4.4.122 there (the latest stable 4.4 version at the moment). Regards, Srivatsa VMware Photon OS
Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed
On 3/21/18 10:12 PM, Srivatsa S. Bhat wrote: > On 3/21/18 7:02 PM, Steve French wrote: >> Found a patch which solves the dependency issue. In my testing (on >> 4.9, with Windows 2016, and also to Samba) as Pavel suggested this >> appears to fix the problem, but I will let Srivatsa confirm that it >> also fixes it for him. The two attached patches for 4.9 should work. >> > > Indeed, those two patches fix the problem for me on 4.9. Thanks a lot > Steve, Pavel and Aurelien for all your efforts in fixing this! > > I was also interested in getting this fixed on 4.4, so I modified the > patches to apply on 4.4.88 and verified that they fix the mount I meant to say 4.4.122 there (the latest stable 4.4 version at the moment). Regards, Srivatsa VMware Photon OS
Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed
On 3/21/18 7:02 PM, Steve French wrote: > Found a patch which solves the dependency issue. In my testing (on > 4.9, with Windows 2016, and also to Samba) as Pavel suggested this > appears to fix the problem, but I will let Srivatsa confirm that it > also fixes it for him. The two attached patches for 4.9 should work. > Indeed, those two patches fix the problem for me on 4.9. Thanks a lot Steve, Pavel and Aurelien for all your efforts in fixing this! I was also interested in getting this fixed on 4.4, so I modified the patches to apply on 4.4.88 and verified that they fix the mount failure. I have attached my patches for 4.4 with this mail. Steve, Pavel, could you kindly double-check the second patch for 4.4, especially around the keygen_exit error path? Thank you very much! Regards, Srivatsa VMware Photon OS From a01a7dfb60e2d5421a487a7b81fd8a1bf72d96d4 Mon Sep 17 00:00:00 2001 From: Steve French <smfre...@gmail.com> Date: Sun, 11 Mar 2018 20:00:27 -0700 Subject: [PATCH 1/2] SMB3: Validate negotiate request must always be signed commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd upstream. According to MS-SMB2 3.2.55 validate_negotiate request must always be signed. Some Windows can fail the request if you send it unsigned See kernel bugzilla bug 197311 [ Fixed up for kernel version 4.4 ] CC: Stable <sta...@vger.kernel.org> Acked-by: Ronnie Sahlberg Signed-off-by: Steve French <smfre...@gmail.com> Signed-off-by: Srivatsa S. Bhat <sriva...@csail.mit.edu> --- fs/cifs/smb2pdu.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index 84614a5..6dae5b8 100644 --- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -1558,6 +1558,9 @@ SMB2_ioctl(const unsigned int xid, struct cifs_tcon *tcon, u64 persistent_fid, } else iov[0].iov_len = get_rfc1002_length(req) + 4; + /* validate negotiate request must be signed - see MS-SMB2 3.2.5.5 */ + if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO) + req->hdr.Flags |= SMB2_FLAGS_SIGNED; rc = SendReceive2(xid, ses, iov, num_iovecs, _buftype, 0); rsp = (struct smb2_ioctl_rsp *)iov[0].iov_base; -- 2.7.4 From d0178d8f096b29a88914787274bdc8ee8334ab07 Mon Sep 17 00:00:00 2001 From: Pavel Shilovsky <pshi...@microsoft.com> Date: Mon, 7 Nov 2016 18:20:50 -0800 Subject: [PATCH 2/2] CIFS: Enable encryption during session setup phase commit cabfb3680f78981d26c078a26e5c748531257ebb upstream. In order to allow encryption on SMB connection we need to exchange a session key and generate encryption and decryption keys. [ Fixed up for kernel version 4.4 ] Signed-off-by: Pavel Shilovsky <pshi...@microsoft.com> Signed-off-by: Srivatsa S. Bhat <sriva...@csail.mit.edu> --- fs/cifs/sess.c| 22 ++ fs/cifs/smb2pdu.c | 8 +--- 2 files changed, 11 insertions(+), 19 deletions(-) diff --git a/fs/cifs/sess.c b/fs/cifs/sess.c index e88ffe1..a035d1a 100644 --- a/fs/cifs/sess.c +++ b/fs/cifs/sess.c @@ -344,13 +344,12 @@ void build_ntlmssp_negotiate_blob(unsigned char *pbuffer, /* BB is NTLMV2 session security format easier to use here? */ flags = NTLMSSP_NEGOTIATE_56 | NTLMSSP_REQUEST_TARGET | NTLMSSP_NEGOTIATE_128 | NTLMSSP_NEGOTIATE_UNICODE | - NTLMSSP_NEGOTIATE_NTLM | NTLMSSP_NEGOTIATE_EXTENDED_SEC; - if (ses->server->sign) { + NTLMSSP_NEGOTIATE_NTLM | NTLMSSP_NEGOTIATE_EXTENDED_SEC | + NTLMSSP_NEGOTIATE_SEAL; + if (ses->server->sign) flags |= NTLMSSP_NEGOTIATE_SIGN; - if (!ses->server->session_estab || - ses->ntlmssp->sesskey_per_smbsess) - flags |= NTLMSSP_NEGOTIATE_KEY_XCH; - } + if (!ses->server->session_estab || ses->ntlmssp->sesskey_per_smbsess) + flags |= NTLMSSP_NEGOTIATE_KEY_XCH; sec_blob->NegotiateFlags = cpu_to_le32(flags); @@ -407,13 +406,12 @@ int build_ntlmssp_auth_blob(unsigned char **pbuffer, flags = NTLMSSP_NEGOTIATE_56 | NTLMSSP_REQUEST_TARGET | NTLMSSP_NEGOTIATE_TARGET_INFO | NTLMSSP_NEGOTIATE_128 | NTLMSSP_NEGOTIATE_UNICODE | - NTLMSSP_NEGOTIATE_NTLM | NTLMSSP_NEGOTIATE_EXTENDED_SEC; - if (ses->server->sign) { + NTLMSSP_NEGOTIATE_NTLM | NTLMSSP_NEGOTIATE_EXTENDED_SEC | + NTLMSSP_NEGOTIATE_SEAL; + if (ses->server->sign) flags |= NTLMSSP_NEGOTIATE_SIGN; - if (!ses->server->session_estab || - ses->ntlmssp->sesskey_per_smbsess) - flags |= NTLMSSP_NEGOTIATE_KEY_XCH; - } + if (!ses->server->session_estab || ses->ntlmssp->sesskey_per_smbsess) + flags |= NTLMSSP_NEGOTIATE_KEY_XCH; tmp = *pbuffer +
Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed
On 3/21/18 7:02 PM, Steve French wrote: > Found a patch which solves the dependency issue. In my testing (on > 4.9, with Windows 2016, and also to Samba) as Pavel suggested this > appears to fix the problem, but I will let Srivatsa confirm that it > also fixes it for him. The two attached patches for 4.9 should work. > Indeed, those two patches fix the problem for me on 4.9. Thanks a lot Steve, Pavel and Aurelien for all your efforts in fixing this! I was also interested in getting this fixed on 4.4, so I modified the patches to apply on 4.4.88 and verified that they fix the mount failure. I have attached my patches for 4.4 with this mail. Steve, Pavel, could you kindly double-check the second patch for 4.4, especially around the keygen_exit error path? Thank you very much! Regards, Srivatsa VMware Photon OS From a01a7dfb60e2d5421a487a7b81fd8a1bf72d96d4 Mon Sep 17 00:00:00 2001 From: Steve French Date: Sun, 11 Mar 2018 20:00:27 -0700 Subject: [PATCH 1/2] SMB3: Validate negotiate request must always be signed commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd upstream. According to MS-SMB2 3.2.55 validate_negotiate request must always be signed. Some Windows can fail the request if you send it unsigned See kernel bugzilla bug 197311 [ Fixed up for kernel version 4.4 ] CC: Stable Acked-by: Ronnie Sahlberg Signed-off-by: Steve French Signed-off-by: Srivatsa S. Bhat --- fs/cifs/smb2pdu.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index 84614a5..6dae5b8 100644 --- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -1558,6 +1558,9 @@ SMB2_ioctl(const unsigned int xid, struct cifs_tcon *tcon, u64 persistent_fid, } else iov[0].iov_len = get_rfc1002_length(req) + 4; + /* validate negotiate request must be signed - see MS-SMB2 3.2.5.5 */ + if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO) + req->hdr.Flags |= SMB2_FLAGS_SIGNED; rc = SendReceive2(xid, ses, iov, num_iovecs, _buftype, 0); rsp = (struct smb2_ioctl_rsp *)iov[0].iov_base; -- 2.7.4 From d0178d8f096b29a88914787274bdc8ee8334ab07 Mon Sep 17 00:00:00 2001 From: Pavel Shilovsky Date: Mon, 7 Nov 2016 18:20:50 -0800 Subject: [PATCH 2/2] CIFS: Enable encryption during session setup phase commit cabfb3680f78981d26c078a26e5c748531257ebb upstream. In order to allow encryption on SMB connection we need to exchange a session key and generate encryption and decryption keys. [ Fixed up for kernel version 4.4 ] Signed-off-by: Pavel Shilovsky Signed-off-by: Srivatsa S. Bhat --- fs/cifs/sess.c| 22 ++ fs/cifs/smb2pdu.c | 8 +--- 2 files changed, 11 insertions(+), 19 deletions(-) diff --git a/fs/cifs/sess.c b/fs/cifs/sess.c index e88ffe1..a035d1a 100644 --- a/fs/cifs/sess.c +++ b/fs/cifs/sess.c @@ -344,13 +344,12 @@ void build_ntlmssp_negotiate_blob(unsigned char *pbuffer, /* BB is NTLMV2 session security format easier to use here? */ flags = NTLMSSP_NEGOTIATE_56 | NTLMSSP_REQUEST_TARGET | NTLMSSP_NEGOTIATE_128 | NTLMSSP_NEGOTIATE_UNICODE | - NTLMSSP_NEGOTIATE_NTLM | NTLMSSP_NEGOTIATE_EXTENDED_SEC; - if (ses->server->sign) { + NTLMSSP_NEGOTIATE_NTLM | NTLMSSP_NEGOTIATE_EXTENDED_SEC | + NTLMSSP_NEGOTIATE_SEAL; + if (ses->server->sign) flags |= NTLMSSP_NEGOTIATE_SIGN; - if (!ses->server->session_estab || - ses->ntlmssp->sesskey_per_smbsess) - flags |= NTLMSSP_NEGOTIATE_KEY_XCH; - } + if (!ses->server->session_estab || ses->ntlmssp->sesskey_per_smbsess) + flags |= NTLMSSP_NEGOTIATE_KEY_XCH; sec_blob->NegotiateFlags = cpu_to_le32(flags); @@ -407,13 +406,12 @@ int build_ntlmssp_auth_blob(unsigned char **pbuffer, flags = NTLMSSP_NEGOTIATE_56 | NTLMSSP_REQUEST_TARGET | NTLMSSP_NEGOTIATE_TARGET_INFO | NTLMSSP_NEGOTIATE_128 | NTLMSSP_NEGOTIATE_UNICODE | - NTLMSSP_NEGOTIATE_NTLM | NTLMSSP_NEGOTIATE_EXTENDED_SEC; - if (ses->server->sign) { + NTLMSSP_NEGOTIATE_NTLM | NTLMSSP_NEGOTIATE_EXTENDED_SEC | + NTLMSSP_NEGOTIATE_SEAL; + if (ses->server->sign) flags |= NTLMSSP_NEGOTIATE_SIGN; - if (!ses->server->session_estab || - ses->ntlmssp->sesskey_per_smbsess) - flags |= NTLMSSP_NEGOTIATE_KEY_XCH; - } + if (!ses->server->session_estab || ses->ntlmssp->sesskey_per_smbsess) + flags |= NTLMSSP_NEGOTIATE_KEY_XCH; tmp = *pbuffer + sizeof(AUTHENTICATE_MESSAGE); sec_blob->NegotiateFlags = cpu_to_le32(flags); diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index 6dae5b8..33b1bc2 100644 --- a/fs/cifs/smb2pdu.c +++ b/fs/
Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed
On 3/1/18 12:12 PM, Steve French wrote: > So far I haven't been able to reproduce this on the current 4.9 stable > tree with vers=3.0 or with default (vers=1.0 for these older kernels). > Maybe the problem also depends on the particular version of Windows that hosts the SMB shares? I'm using Windows Server 2016 (Version 1607, OS Build 14393.693). With vers=3.0, the issue is reproducible every time, but vers=1.0 works fine. Regards, Srivatsa > On Tue, Feb 27, 2018 at 11:56 AM, Steve French <smfre...@gmail.com> wrote: >> This shouldn't be too hard to figure out if willing to backport a >> slightly larger set of fixes to the older stable, but I don't have a >> system running 4.9 stable. >> >> Is this the correct stable tree branch? >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/?h=linux-4.9.y >> >> On Tue, Feb 27, 2018 at 11:45 AM, Srivatsa S. Bhat >> <sriva...@csail.mit.edu> wrote: >>> On 2/27/18 4:40 AM, Greg Kroah-Hartman wrote: >>>> On Tue, Feb 27, 2018 at 01:22:31AM -0800, Srivatsa S. Bhat wrote: >>>>> On 2/27/18 12:54 AM, Greg Kroah-Hartman wrote: >>>>>> On Mon, Feb 26, 2018 at 07:44:28PM -0800, Srivatsa S. Bhat wrote: >>>>>>> On 1/3/18 6:15 PM, Srivatsa S. Bhat wrote: >>>>>>>> On 11/1/17 8:18 AM, Greg Kroah-Hartman wrote: >>>>>>>>> On Tue, Oct 31, 2017 at 03:02:11PM +0200, Thomas Backlund wrote: >>>>>>>>>> Den 31.10.2017 kl. 11:55, skrev Greg Kroah-Hartman: >>>>>>>>>>> 4.13-stable review patch. If anyone has any objections, please let >>>>>>>>>>> me know. >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> From: Steve French <smfre...@gmail.com> >>>>>>>>>>> >>>>>>>>>>> commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd upstream. >>>>>>>>>>> >>>>>>>>>>> According to MS-SMB2 3.2.55 validate_negotiate request must >>>>>>>>>>> always be signed. Some Windows can fail the request if you send it >>>>>>>>>>> unsigned >>>>>>>>>>> >>>>>>>>>>> See kernel bugzilla bug 197311 >>>>>>>>>>> >>>>>>>>>>> Acked-by: Ronnie Sahlberg >>>>>>>>>>> Signed-off-by: Steve French <smfre...@gmail.com> >>>>>>>>>>> Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org> >>>>>>>>>>> >>>>>>>>>>> --- >>>>>>>>>>> fs/cifs/smb2pdu.c |3 +++ >>>>>>>>>>> 1 file changed, 3 insertions(+) >>>>>>>>>>> >>>>>>>>>>> --- a/fs/cifs/smb2pdu.c >>>>>>>>>>> +++ b/fs/cifs/smb2pdu.c >>>>>>>>>>> @@ -1963,6 +1963,9 @@ SMB2_ioctl(const unsigned int xid, struc >>>>>>>>>>>} else >>>>>>>>>>>iov[0].iov_len = get_rfc1002_length(req) + 4; >>>>>>>>>>> + /* validate negotiate request must be signed - see MS-SMB2 >>>>>>>>>>> 3.2.5.5 */ >>>>>>>>>>> + if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO) >>>>>>>>>>> + req->hdr.sync_hdr.Flags |= SMB2_FLAGS_SIGNED; >>>>>>>>>>>rc = SendReceive2(xid, ses, iov, n_iov, _buftype, >>>>>>>>>>> flags, _iov); >>>>>>>>>>>cifs_small_buf_release(req); >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> This one needs to be backported to all stable kernels as the commit >>>>>>>>>> that >>>>>>>>>> introduced the regression: >>>>>>>>>> ' >>>>>>>>>> 0603c96f3af50e2f9299fa410c224ab1d465e0f9 >>>>>>>>>> SMB: Validate negotiate (to protect against downgrade) even if >>>>>>>>>> signing off >>
Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed
On 3/1/18 12:12 PM, Steve French wrote: > So far I haven't been able to reproduce this on the current 4.9 stable > tree with vers=3.0 or with default (vers=1.0 for these older kernels). > Maybe the problem also depends on the particular version of Windows that hosts the SMB shares? I'm using Windows Server 2016 (Version 1607, OS Build 14393.693). With vers=3.0, the issue is reproducible every time, but vers=1.0 works fine. Regards, Srivatsa > On Tue, Feb 27, 2018 at 11:56 AM, Steve French wrote: >> This shouldn't be too hard to figure out if willing to backport a >> slightly larger set of fixes to the older stable, but I don't have a >> system running 4.9 stable. >> >> Is this the correct stable tree branch? >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/?h=linux-4.9.y >> >> On Tue, Feb 27, 2018 at 11:45 AM, Srivatsa S. Bhat >> wrote: >>> On 2/27/18 4:40 AM, Greg Kroah-Hartman wrote: >>>> On Tue, Feb 27, 2018 at 01:22:31AM -0800, Srivatsa S. Bhat wrote: >>>>> On 2/27/18 12:54 AM, Greg Kroah-Hartman wrote: >>>>>> On Mon, Feb 26, 2018 at 07:44:28PM -0800, Srivatsa S. Bhat wrote: >>>>>>> On 1/3/18 6:15 PM, Srivatsa S. Bhat wrote: >>>>>>>> On 11/1/17 8:18 AM, Greg Kroah-Hartman wrote: >>>>>>>>> On Tue, Oct 31, 2017 at 03:02:11PM +0200, Thomas Backlund wrote: >>>>>>>>>> Den 31.10.2017 kl. 11:55, skrev Greg Kroah-Hartman: >>>>>>>>>>> 4.13-stable review patch. If anyone has any objections, please let >>>>>>>>>>> me know. >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> From: Steve French >>>>>>>>>>> >>>>>>>>>>> commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd upstream. >>>>>>>>>>> >>>>>>>>>>> According to MS-SMB2 3.2.55 validate_negotiate request must >>>>>>>>>>> always be signed. Some Windows can fail the request if you send it >>>>>>>>>>> unsigned >>>>>>>>>>> >>>>>>>>>>> See kernel bugzilla bug 197311 >>>>>>>>>>> >>>>>>>>>>> Acked-by: Ronnie Sahlberg >>>>>>>>>>> Signed-off-by: Steve French >>>>>>>>>>> Signed-off-by: Greg Kroah-Hartman >>>>>>>>>>> >>>>>>>>>>> --- >>>>>>>>>>> fs/cifs/smb2pdu.c |3 +++ >>>>>>>>>>> 1 file changed, 3 insertions(+) >>>>>>>>>>> >>>>>>>>>>> --- a/fs/cifs/smb2pdu.c >>>>>>>>>>> +++ b/fs/cifs/smb2pdu.c >>>>>>>>>>> @@ -1963,6 +1963,9 @@ SMB2_ioctl(const unsigned int xid, struc >>>>>>>>>>>} else >>>>>>>>>>>iov[0].iov_len = get_rfc1002_length(req) + 4; >>>>>>>>>>> + /* validate negotiate request must be signed - see MS-SMB2 >>>>>>>>>>> 3.2.5.5 */ >>>>>>>>>>> + if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO) >>>>>>>>>>> + req->hdr.sync_hdr.Flags |= SMB2_FLAGS_SIGNED; >>>>>>>>>>>rc = SendReceive2(xid, ses, iov, n_iov, _buftype, >>>>>>>>>>> flags, _iov); >>>>>>>>>>>cifs_small_buf_release(req); >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> This one needs to be backported to all stable kernels as the commit >>>>>>>>>> that >>>>>>>>>> introduced the regression: >>>>>>>>>> ' >>>>>>>>>> 0603c96f3af50e2f9299fa410c224ab1d465e0f9 >>>>>>>>>> SMB: Validate negotiate (to protect against downgrade) even if >>>>>>>>>> signing off >>>>>>>>>> >>>>>>>>>> is backported in stable trees as of: 4.9.53, 4.4.90, 3.18.73 >>
Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed
On 2/27/18 9:56 AM, Steve French wrote: > This shouldn't be too hard to figure out if willing to backport a > slightly larger set of fixes to the older stable, but I don't have a > system running 4.9 stable. > If you have the proposed patches that apply on 4.9, I'd be happy to try them out! [ I would have offered to backport the patches myself, but actually I already tried doing that with a larger set of patches from mainline (picking those commits between the regression and the fix that seemed relevant), but I felt quite out-of-depth trying to adapt them to 4.9 and 4.4, as I'm not that familiar with the internals of SMB/CIFS. ] > Is this the correct stable tree branch? > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/?h=linux-4.9.y > Yep! Regards, Srivatsa > On Tue, Feb 27, 2018 at 11:45 AM, Srivatsa S. Bhat > <sriva...@csail.mit.edu> wrote: >> On 2/27/18 4:40 AM, Greg Kroah-Hartman wrote: >>> On Tue, Feb 27, 2018 at 01:22:31AM -0800, Srivatsa S. Bhat wrote: >>>> On 2/27/18 12:54 AM, Greg Kroah-Hartman wrote: >>>>> On Mon, Feb 26, 2018 at 07:44:28PM -0800, Srivatsa S. Bhat wrote: >>>>>> On 1/3/18 6:15 PM, Srivatsa S. Bhat wrote: >>>>>>> On 11/1/17 8:18 AM, Greg Kroah-Hartman wrote: >>>>>>>> On Tue, Oct 31, 2017 at 03:02:11PM +0200, Thomas Backlund wrote: >>>>>>>>> Den 31.10.2017 kl. 11:55, skrev Greg Kroah-Hartman: >>>>>>>>>> 4.13-stable review patch. If anyone has any objections, please let >>>>>>>>>> me know. >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> From: Steve French <smfre...@gmail.com> >>>>>>>>>> >>>>>>>>>> commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd upstream. >>>>>>>>>> >>>>>>>>>> According to MS-SMB2 3.2.55 validate_negotiate request must >>>>>>>>>> always be signed. Some Windows can fail the request if you send it >>>>>>>>>> unsigned >>>>>>>>>> >>>>>>>>>> See kernel bugzilla bug 197311 >>>>>>>>>> >>>>>>>>>> Acked-by: Ronnie Sahlberg >>>>>>>>>> Signed-off-by: Steve French <smfre...@gmail.com> >>>>>>>>>> Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org> >>>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> fs/cifs/smb2pdu.c |3 +++ >>>>>>>>>> 1 file changed, 3 insertions(+) >>>>>>>>>> >>>>>>>>>> --- a/fs/cifs/smb2pdu.c >>>>>>>>>> +++ b/fs/cifs/smb2pdu.c >>>>>>>>>> @@ -1963,6 +1963,9 @@ SMB2_ioctl(const unsigned int xid, struc >>>>>>>>>>} else >>>>>>>>>>iov[0].iov_len = get_rfc1002_length(req) + 4; >>>>>>>>>> + /* validate negotiate request must be signed - see MS-SMB2 >>>>>>>>>> 3.2.5.5 */ >>>>>>>>>> + if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO) >>>>>>>>>> + req->hdr.sync_hdr.Flags |= SMB2_FLAGS_SIGNED; >>>>>>>>>>rc = SendReceive2(xid, ses, iov, n_iov, _buftype, flags, >>>>>>>>>> _iov); >>>>>>>>>>cifs_small_buf_release(req); >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> This one needs to be backported to all stable kernels as the commit >>>>>>>>> that >>>>>>>>> introduced the regression: >>>>>>>>> ' >>>>>>>>> 0603c96f3af50e2f9299fa410c224ab1d465e0f9 >>>>>>>>> SMB: Validate negotiate (to protect against downgrade) even if >>>>>>>>> signing off >>>>>>>>> >>>>>>>>> is backported in stable trees as of: 4.9.53, 4.4.90, 3.18.73 >>>>>>>> >>>>>>>> Oh wait, it breaks the builds on older kernels, that's why I didn't >>>>>>>>
Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed
On 2/27/18 9:56 AM, Steve French wrote: > This shouldn't be too hard to figure out if willing to backport a > slightly larger set of fixes to the older stable, but I don't have a > system running 4.9 stable. > If you have the proposed patches that apply on 4.9, I'd be happy to try them out! [ I would have offered to backport the patches myself, but actually I already tried doing that with a larger set of patches from mainline (picking those commits between the regression and the fix that seemed relevant), but I felt quite out-of-depth trying to adapt them to 4.9 and 4.4, as I'm not that familiar with the internals of SMB/CIFS. ] > Is this the correct stable tree branch? > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/?h=linux-4.9.y > Yep! Regards, Srivatsa > On Tue, Feb 27, 2018 at 11:45 AM, Srivatsa S. Bhat > wrote: >> On 2/27/18 4:40 AM, Greg Kroah-Hartman wrote: >>> On Tue, Feb 27, 2018 at 01:22:31AM -0800, Srivatsa S. Bhat wrote: >>>> On 2/27/18 12:54 AM, Greg Kroah-Hartman wrote: >>>>> On Mon, Feb 26, 2018 at 07:44:28PM -0800, Srivatsa S. Bhat wrote: >>>>>> On 1/3/18 6:15 PM, Srivatsa S. Bhat wrote: >>>>>>> On 11/1/17 8:18 AM, Greg Kroah-Hartman wrote: >>>>>>>> On Tue, Oct 31, 2017 at 03:02:11PM +0200, Thomas Backlund wrote: >>>>>>>>> Den 31.10.2017 kl. 11:55, skrev Greg Kroah-Hartman: >>>>>>>>>> 4.13-stable review patch. If anyone has any objections, please let >>>>>>>>>> me know. >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> From: Steve French >>>>>>>>>> >>>>>>>>>> commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd upstream. >>>>>>>>>> >>>>>>>>>> According to MS-SMB2 3.2.55 validate_negotiate request must >>>>>>>>>> always be signed. Some Windows can fail the request if you send it >>>>>>>>>> unsigned >>>>>>>>>> >>>>>>>>>> See kernel bugzilla bug 197311 >>>>>>>>>> >>>>>>>>>> Acked-by: Ronnie Sahlberg >>>>>>>>>> Signed-off-by: Steve French >>>>>>>>>> Signed-off-by: Greg Kroah-Hartman >>>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> fs/cifs/smb2pdu.c |3 +++ >>>>>>>>>> 1 file changed, 3 insertions(+) >>>>>>>>>> >>>>>>>>>> --- a/fs/cifs/smb2pdu.c >>>>>>>>>> +++ b/fs/cifs/smb2pdu.c >>>>>>>>>> @@ -1963,6 +1963,9 @@ SMB2_ioctl(const unsigned int xid, struc >>>>>>>>>>} else >>>>>>>>>>iov[0].iov_len = get_rfc1002_length(req) + 4; >>>>>>>>>> + /* validate negotiate request must be signed - see MS-SMB2 >>>>>>>>>> 3.2.5.5 */ >>>>>>>>>> + if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO) >>>>>>>>>> + req->hdr.sync_hdr.Flags |= SMB2_FLAGS_SIGNED; >>>>>>>>>>rc = SendReceive2(xid, ses, iov, n_iov, _buftype, flags, >>>>>>>>>> _iov); >>>>>>>>>>cifs_small_buf_release(req); >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> This one needs to be backported to all stable kernels as the commit >>>>>>>>> that >>>>>>>>> introduced the regression: >>>>>>>>> ' >>>>>>>>> 0603c96f3af50e2f9299fa410c224ab1d465e0f9 >>>>>>>>> SMB: Validate negotiate (to protect against downgrade) even if >>>>>>>>> signing off >>>>>>>>> >>>>>>>>> is backported in stable trees as of: 4.9.53, 4.4.90, 3.18.73 >>>>>>>> >>>>>>>> Oh wait, it breaks the builds on older kernels, that's why I didn't >>>>>>>> apply it :) >>>>>>>> >>>>>>>> Can you provide me with a workin
Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed
On 2/27/18 4:40 AM, Greg Kroah-Hartman wrote: > On Tue, Feb 27, 2018 at 01:22:31AM -0800, Srivatsa S. Bhat wrote: >> On 2/27/18 12:54 AM, Greg Kroah-Hartman wrote: >>> On Mon, Feb 26, 2018 at 07:44:28PM -0800, Srivatsa S. Bhat wrote: >>>> On 1/3/18 6:15 PM, Srivatsa S. Bhat wrote: >>>>> On 11/1/17 8:18 AM, Greg Kroah-Hartman wrote: >>>>>> On Tue, Oct 31, 2017 at 03:02:11PM +0200, Thomas Backlund wrote: >>>>>>> Den 31.10.2017 kl. 11:55, skrev Greg Kroah-Hartman: >>>>>>>> 4.13-stable review patch. If anyone has any objections, please let me >>>>>>>> know. >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> From: Steve French <smfre...@gmail.com> >>>>>>>> >>>>>>>> commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd upstream. >>>>>>>> >>>>>>>> According to MS-SMB2 3.2.55 validate_negotiate request must >>>>>>>> always be signed. Some Windows can fail the request if you send it >>>>>>>> unsigned >>>>>>>> >>>>>>>> See kernel bugzilla bug 197311 >>>>>>>> >>>>>>>> Acked-by: Ronnie Sahlberg >>>>>>>> Signed-off-by: Steve French <smfre...@gmail.com> >>>>>>>> Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org> >>>>>>>> >>>>>>>> --- >>>>>>>> fs/cifs/smb2pdu.c |3 +++ >>>>>>>> 1 file changed, 3 insertions(+) >>>>>>>> >>>>>>>> --- a/fs/cifs/smb2pdu.c >>>>>>>> +++ b/fs/cifs/smb2pdu.c >>>>>>>> @@ -1963,6 +1963,9 @@ SMB2_ioctl(const unsigned int xid, struc >>>>>>>>} else >>>>>>>>iov[0].iov_len = get_rfc1002_length(req) + 4; >>>>>>>> + /* validate negotiate request must be signed - see MS-SMB2 >>>>>>>> 3.2.5.5 */ >>>>>>>> + if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO) >>>>>>>> + req->hdr.sync_hdr.Flags |= SMB2_FLAGS_SIGNED; >>>>>>>>rc = SendReceive2(xid, ses, iov, n_iov, _buftype, flags, >>>>>>>> _iov); >>>>>>>>cifs_small_buf_release(req); >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> This one needs to be backported to all stable kernels as the commit that >>>>>>> introduced the regression: >>>>>>> ' >>>>>>> 0603c96f3af50e2f9299fa410c224ab1d465e0f9 >>>>>>> SMB: Validate negotiate (to protect against downgrade) even if signing >>>>>>> off >>>>>>> >>>>>>> is backported in stable trees as of: 4.9.53, 4.4.90, 3.18.73 >>>>>> >>>>>> Oh wait, it breaks the builds on older kernels, that's why I didn't >>>>>> apply it :) >>>>>> >>>>>> Can you provide me with a working backport? >>>>>> >>>>> >>>>> Hi Steve, >>>>> >>>>> Is there a version of this fix available for stable kernels? >>>>> >>>> >>>> Hi Greg, >>>> >>>> Mounting SMB3 shares continues to fail for me on 4.4.118 and 4.9.84 >>>> due to the issues that I have described in detail on this mail thread. >>>> >>>> Since there is no apparent fix for this bug on stable kernels, could >>>> you please consider reverting the original commit that caused this >>>> regression? >>>> >>>> That commit was intended to enhance security, which is probably why it >>>> was backported to stable kernels in the first place; but instead it >>>> ends up breaking basic functionality itself (mounting). So in the >>>> absence of a proper fix, I don't see much of an option but to revert >>>> that commit. >>>> >>>> So, please consider reverting the following: >>>> >>>> commit 02ef29f9cbb616bf419 "SMB: Validate negotiate (to protect >>>> against downgrade) even if signing off" on 4.4.118 >>>> >>
Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed
On 2/27/18 4:40 AM, Greg Kroah-Hartman wrote: > On Tue, Feb 27, 2018 at 01:22:31AM -0800, Srivatsa S. Bhat wrote: >> On 2/27/18 12:54 AM, Greg Kroah-Hartman wrote: >>> On Mon, Feb 26, 2018 at 07:44:28PM -0800, Srivatsa S. Bhat wrote: >>>> On 1/3/18 6:15 PM, Srivatsa S. Bhat wrote: >>>>> On 11/1/17 8:18 AM, Greg Kroah-Hartman wrote: >>>>>> On Tue, Oct 31, 2017 at 03:02:11PM +0200, Thomas Backlund wrote: >>>>>>> Den 31.10.2017 kl. 11:55, skrev Greg Kroah-Hartman: >>>>>>>> 4.13-stable review patch. If anyone has any objections, please let me >>>>>>>> know. >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> From: Steve French >>>>>>>> >>>>>>>> commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd upstream. >>>>>>>> >>>>>>>> According to MS-SMB2 3.2.55 validate_negotiate request must >>>>>>>> always be signed. Some Windows can fail the request if you send it >>>>>>>> unsigned >>>>>>>> >>>>>>>> See kernel bugzilla bug 197311 >>>>>>>> >>>>>>>> Acked-by: Ronnie Sahlberg >>>>>>>> Signed-off-by: Steve French >>>>>>>> Signed-off-by: Greg Kroah-Hartman >>>>>>>> >>>>>>>> --- >>>>>>>> fs/cifs/smb2pdu.c |3 +++ >>>>>>>> 1 file changed, 3 insertions(+) >>>>>>>> >>>>>>>> --- a/fs/cifs/smb2pdu.c >>>>>>>> +++ b/fs/cifs/smb2pdu.c >>>>>>>> @@ -1963,6 +1963,9 @@ SMB2_ioctl(const unsigned int xid, struc >>>>>>>>} else >>>>>>>>iov[0].iov_len = get_rfc1002_length(req) + 4; >>>>>>>> + /* validate negotiate request must be signed - see MS-SMB2 >>>>>>>> 3.2.5.5 */ >>>>>>>> + if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO) >>>>>>>> + req->hdr.sync_hdr.Flags |= SMB2_FLAGS_SIGNED; >>>>>>>>rc = SendReceive2(xid, ses, iov, n_iov, _buftype, flags, >>>>>>>> _iov); >>>>>>>>cifs_small_buf_release(req); >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> This one needs to be backported to all stable kernels as the commit that >>>>>>> introduced the regression: >>>>>>> ' >>>>>>> 0603c96f3af50e2f9299fa410c224ab1d465e0f9 >>>>>>> SMB: Validate negotiate (to protect against downgrade) even if signing >>>>>>> off >>>>>>> >>>>>>> is backported in stable trees as of: 4.9.53, 4.4.90, 3.18.73 >>>>>> >>>>>> Oh wait, it breaks the builds on older kernels, that's why I didn't >>>>>> apply it :) >>>>>> >>>>>> Can you provide me with a working backport? >>>>>> >>>>> >>>>> Hi Steve, >>>>> >>>>> Is there a version of this fix available for stable kernels? >>>>> >>>> >>>> Hi Greg, >>>> >>>> Mounting SMB3 shares continues to fail for me on 4.4.118 and 4.9.84 >>>> due to the issues that I have described in detail on this mail thread. >>>> >>>> Since there is no apparent fix for this bug on stable kernels, could >>>> you please consider reverting the original commit that caused this >>>> regression? >>>> >>>> That commit was intended to enhance security, which is probably why it >>>> was backported to stable kernels in the first place; but instead it >>>> ends up breaking basic functionality itself (mounting). So in the >>>> absence of a proper fix, I don't see much of an option but to revert >>>> that commit. >>>> >>>> So, please consider reverting the following: >>>> >>>> commit 02ef29f9cbb616bf419 "SMB: Validate negotiate (to protect >>>> against downgrade) even if signing off" on 4.4.118 >>>> >>>> commit 0e1b85a41a25ac888fb "SMB: Validate negotiate (to protect >>>
Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed
On 2/27/18 12:54 AM, Greg Kroah-Hartman wrote: > On Mon, Feb 26, 2018 at 07:44:28PM -0800, Srivatsa S. Bhat wrote: >> On 1/3/18 6:15 PM, Srivatsa S. Bhat wrote: >>> On 11/1/17 8:18 AM, Greg Kroah-Hartman wrote: >>>> On Tue, Oct 31, 2017 at 03:02:11PM +0200, Thomas Backlund wrote: >>>>> Den 31.10.2017 kl. 11:55, skrev Greg Kroah-Hartman: >>>>>> 4.13-stable review patch. If anyone has any objections, please let me >>>>>> know. >>>>>> >>>>>> -- >>>>>> >>>>>> From: Steve French <smfre...@gmail.com> >>>>>> >>>>>> commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd upstream. >>>>>> >>>>>> According to MS-SMB2 3.2.55 validate_negotiate request must >>>>>> always be signed. Some Windows can fail the request if you send it >>>>>> unsigned >>>>>> >>>>>> See kernel bugzilla bug 197311 >>>>>> >>>>>> Acked-by: Ronnie Sahlberg >>>>>> Signed-off-by: Steve French <smfre...@gmail.com> >>>>>> Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org> >>>>>> >>>>>> --- >>>>>> fs/cifs/smb2pdu.c |3 +++ >>>>>> 1 file changed, 3 insertions(+) >>>>>> >>>>>> --- a/fs/cifs/smb2pdu.c >>>>>> +++ b/fs/cifs/smb2pdu.c >>>>>> @@ -1963,6 +1963,9 @@ SMB2_ioctl(const unsigned int xid, struc >>>>>> } else >>>>>> iov[0].iov_len = get_rfc1002_length(req) + 4; >>>>>> +/* validate negotiate request must be signed - see MS-SMB2 >>>>>> 3.2.5.5 */ >>>>>> +if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO) >>>>>> +req->hdr.sync_hdr.Flags |= SMB2_FLAGS_SIGNED; >>>>>> rc = SendReceive2(xid, ses, iov, n_iov, _buftype, flags, >>>>>> _iov); >>>>>> cifs_small_buf_release(req); >>>>>> >>>>>> >>>>>> >>>>> >>>>> This one needs to be backported to all stable kernels as the commit that >>>>> introduced the regression: >>>>> ' >>>>> 0603c96f3af50e2f9299fa410c224ab1d465e0f9 >>>>> SMB: Validate negotiate (to protect against downgrade) even if signing off >>>>> >>>>> is backported in stable trees as of: 4.9.53, 4.4.90, 3.18.73 >>>> >>>> Oh wait, it breaks the builds on older kernels, that's why I didn't >>>> apply it :) >>>> >>>> Can you provide me with a working backport? >>>> >>> >>> Hi Steve, >>> >>> Is there a version of this fix available for stable kernels? >>> >> >> Hi Greg, >> >> Mounting SMB3 shares continues to fail for me on 4.4.118 and 4.9.84 >> due to the issues that I have described in detail on this mail thread. >> >> Since there is no apparent fix for this bug on stable kernels, could >> you please consider reverting the original commit that caused this >> regression? >> >> That commit was intended to enhance security, which is probably why it >> was backported to stable kernels in the first place; but instead it >> ends up breaking basic functionality itself (mounting). So in the >> absence of a proper fix, I don't see much of an option but to revert >> that commit. >> >> So, please consider reverting the following: >> >> commit 02ef29f9cbb616bf419 "SMB: Validate negotiate (to protect >> against downgrade) even if signing off" on 4.4.118 >> >> commit 0e1b85a41a25ac888fb "SMB: Validate negotiate (to protect >> against downgrade) even if signing off" on 4.9.84 >> >> They correspond to commit 0603c96f3af50e2f9299fa410c224ab1d465e0f9 >> upstream. Both these patches should revert cleanly. > > Do you still have this same problem on 4.14 and 4.15? If so, the issue > needs to get fixed there, not papered-over by reverting these old > changes, as you will hit the issue again in the future when you update > to a newer kernel version. > 4.14 and 4.15 work great! (I had mentioned this is in my original bug report but forgot to summarize it here, sorry). Thank you! Regards, Srivatsa
Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed
On 2/27/18 12:54 AM, Greg Kroah-Hartman wrote: > On Mon, Feb 26, 2018 at 07:44:28PM -0800, Srivatsa S. Bhat wrote: >> On 1/3/18 6:15 PM, Srivatsa S. Bhat wrote: >>> On 11/1/17 8:18 AM, Greg Kroah-Hartman wrote: >>>> On Tue, Oct 31, 2017 at 03:02:11PM +0200, Thomas Backlund wrote: >>>>> Den 31.10.2017 kl. 11:55, skrev Greg Kroah-Hartman: >>>>>> 4.13-stable review patch. If anyone has any objections, please let me >>>>>> know. >>>>>> >>>>>> -- >>>>>> >>>>>> From: Steve French >>>>>> >>>>>> commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd upstream. >>>>>> >>>>>> According to MS-SMB2 3.2.55 validate_negotiate request must >>>>>> always be signed. Some Windows can fail the request if you send it >>>>>> unsigned >>>>>> >>>>>> See kernel bugzilla bug 197311 >>>>>> >>>>>> Acked-by: Ronnie Sahlberg >>>>>> Signed-off-by: Steve French >>>>>> Signed-off-by: Greg Kroah-Hartman >>>>>> >>>>>> --- >>>>>> fs/cifs/smb2pdu.c |3 +++ >>>>>> 1 file changed, 3 insertions(+) >>>>>> >>>>>> --- a/fs/cifs/smb2pdu.c >>>>>> +++ b/fs/cifs/smb2pdu.c >>>>>> @@ -1963,6 +1963,9 @@ SMB2_ioctl(const unsigned int xid, struc >>>>>> } else >>>>>> iov[0].iov_len = get_rfc1002_length(req) + 4; >>>>>> +/* validate negotiate request must be signed - see MS-SMB2 >>>>>> 3.2.5.5 */ >>>>>> +if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO) >>>>>> +req->hdr.sync_hdr.Flags |= SMB2_FLAGS_SIGNED; >>>>>> rc = SendReceive2(xid, ses, iov, n_iov, _buftype, flags, >>>>>> _iov); >>>>>> cifs_small_buf_release(req); >>>>>> >>>>>> >>>>>> >>>>> >>>>> This one needs to be backported to all stable kernels as the commit that >>>>> introduced the regression: >>>>> ' >>>>> 0603c96f3af50e2f9299fa410c224ab1d465e0f9 >>>>> SMB: Validate negotiate (to protect against downgrade) even if signing off >>>>> >>>>> is backported in stable trees as of: 4.9.53, 4.4.90, 3.18.73 >>>> >>>> Oh wait, it breaks the builds on older kernels, that's why I didn't >>>> apply it :) >>>> >>>> Can you provide me with a working backport? >>>> >>> >>> Hi Steve, >>> >>> Is there a version of this fix available for stable kernels? >>> >> >> Hi Greg, >> >> Mounting SMB3 shares continues to fail for me on 4.4.118 and 4.9.84 >> due to the issues that I have described in detail on this mail thread. >> >> Since there is no apparent fix for this bug on stable kernels, could >> you please consider reverting the original commit that caused this >> regression? >> >> That commit was intended to enhance security, which is probably why it >> was backported to stable kernels in the first place; but instead it >> ends up breaking basic functionality itself (mounting). So in the >> absence of a proper fix, I don't see much of an option but to revert >> that commit. >> >> So, please consider reverting the following: >> >> commit 02ef29f9cbb616bf419 "SMB: Validate negotiate (to protect >> against downgrade) even if signing off" on 4.4.118 >> >> commit 0e1b85a41a25ac888fb "SMB: Validate negotiate (to protect >> against downgrade) even if signing off" on 4.9.84 >> >> They correspond to commit 0603c96f3af50e2f9299fa410c224ab1d465e0f9 >> upstream. Both these patches should revert cleanly. > > Do you still have this same problem on 4.14 and 4.15? If so, the issue > needs to get fixed there, not papered-over by reverting these old > changes, as you will hit the issue again in the future when you update > to a newer kernel version. > 4.14 and 4.15 work great! (I had mentioned this is in my original bug report but forgot to summarize it here, sorry). Thank you! Regards, Srivatsa
Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed
On 1/3/18 6:15 PM, Srivatsa S. Bhat wrote: > On 11/1/17 8:18 AM, Greg Kroah-Hartman wrote: >> On Tue, Oct 31, 2017 at 03:02:11PM +0200, Thomas Backlund wrote: >>> Den 31.10.2017 kl. 11:55, skrev Greg Kroah-Hartman: >>>> 4.13-stable review patch. If anyone has any objections, please let me >>>> know. >>>> >>>> -- >>>> >>>> From: Steve French <smfre...@gmail.com> >>>> >>>> commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd upstream. >>>> >>>> According to MS-SMB2 3.2.55 validate_negotiate request must >>>> always be signed. Some Windows can fail the request if you send it unsigned >>>> >>>> See kernel bugzilla bug 197311 >>>> >>>> Acked-by: Ronnie Sahlberg >>>> Signed-off-by: Steve French <smfre...@gmail.com> >>>> Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org> >>>> >>>> --- >>>> fs/cifs/smb2pdu.c |3 +++ >>>> 1 file changed, 3 insertions(+) >>>> >>>> --- a/fs/cifs/smb2pdu.c >>>> +++ b/fs/cifs/smb2pdu.c >>>> @@ -1963,6 +1963,9 @@ SMB2_ioctl(const unsigned int xid, struc >>>>} else >>>>iov[0].iov_len = get_rfc1002_length(req) + 4; >>>> + /* validate negotiate request must be signed - see MS-SMB2 3.2.5.5 */ >>>> + if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO) >>>> + req->hdr.sync_hdr.Flags |= SMB2_FLAGS_SIGNED; >>>>rc = SendReceive2(xid, ses, iov, n_iov, _buftype, flags, _iov); >>>>cifs_small_buf_release(req); >>>> >>>> >>>> >>> >>> This one needs to be backported to all stable kernels as the commit that >>> introduced the regression: >>> ' >>> 0603c96f3af50e2f9299fa410c224ab1d465e0f9 >>> SMB: Validate negotiate (to protect against downgrade) even if signing off >>> >>> is backported in stable trees as of: 4.9.53, 4.4.90, 3.18.73 >> >> Oh wait, it breaks the builds on older kernels, that's why I didn't >> apply it :) >> >> Can you provide me with a working backport? >> > > Hi Steve, > > Is there a version of this fix available for stable kernels? > Hi Greg, Mounting SMB3 shares continues to fail for me on 4.4.118 and 4.9.84 due to the issues that I have described in detail on this mail thread. Since there is no apparent fix for this bug on stable kernels, could you please consider reverting the original commit that caused this regression? That commit was intended to enhance security, which is probably why it was backported to stable kernels in the first place; but instead it ends up breaking basic functionality itself (mounting). So in the absence of a proper fix, I don't see much of an option but to revert that commit. So, please consider reverting the following: commit 02ef29f9cbb616bf419 "SMB: Validate negotiate (to protect against downgrade) even if signing off" on 4.4.118 commit 0e1b85a41a25ac888fb "SMB: Validate negotiate (to protect against downgrade) even if signing off" on 4.9.84 They correspond to commit 0603c96f3af50e2f9299fa410c224ab1d465e0f9 upstream. Both these patches should revert cleanly. Thank you! Regards, Srivatsa
Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed
On 1/3/18 6:15 PM, Srivatsa S. Bhat wrote: > On 11/1/17 8:18 AM, Greg Kroah-Hartman wrote: >> On Tue, Oct 31, 2017 at 03:02:11PM +0200, Thomas Backlund wrote: >>> Den 31.10.2017 kl. 11:55, skrev Greg Kroah-Hartman: >>>> 4.13-stable review patch. If anyone has any objections, please let me >>>> know. >>>> >>>> -- >>>> >>>> From: Steve French >>>> >>>> commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd upstream. >>>> >>>> According to MS-SMB2 3.2.55 validate_negotiate request must >>>> always be signed. Some Windows can fail the request if you send it unsigned >>>> >>>> See kernel bugzilla bug 197311 >>>> >>>> Acked-by: Ronnie Sahlberg >>>> Signed-off-by: Steve French >>>> Signed-off-by: Greg Kroah-Hartman >>>> >>>> --- >>>> fs/cifs/smb2pdu.c |3 +++ >>>> 1 file changed, 3 insertions(+) >>>> >>>> --- a/fs/cifs/smb2pdu.c >>>> +++ b/fs/cifs/smb2pdu.c >>>> @@ -1963,6 +1963,9 @@ SMB2_ioctl(const unsigned int xid, struc >>>>} else >>>>iov[0].iov_len = get_rfc1002_length(req) + 4; >>>> + /* validate negotiate request must be signed - see MS-SMB2 3.2.5.5 */ >>>> + if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO) >>>> + req->hdr.sync_hdr.Flags |= SMB2_FLAGS_SIGNED; >>>>rc = SendReceive2(xid, ses, iov, n_iov, _buftype, flags, _iov); >>>>cifs_small_buf_release(req); >>>> >>>> >>>> >>> >>> This one needs to be backported to all stable kernels as the commit that >>> introduced the regression: >>> ' >>> 0603c96f3af50e2f9299fa410c224ab1d465e0f9 >>> SMB: Validate negotiate (to protect against downgrade) even if signing off >>> >>> is backported in stable trees as of: 4.9.53, 4.4.90, 3.18.73 >> >> Oh wait, it breaks the builds on older kernels, that's why I didn't >> apply it :) >> >> Can you provide me with a working backport? >> > > Hi Steve, > > Is there a version of this fix available for stable kernels? > Hi Greg, Mounting SMB3 shares continues to fail for me on 4.4.118 and 4.9.84 due to the issues that I have described in detail on this mail thread. Since there is no apparent fix for this bug on stable kernels, could you please consider reverting the original commit that caused this regression? That commit was intended to enhance security, which is probably why it was backported to stable kernels in the first place; but instead it ends up breaking basic functionality itself (mounting). So in the absence of a proper fix, I don't see much of an option but to revert that commit. So, please consider reverting the following: commit 02ef29f9cbb616bf419 "SMB: Validate negotiate (to protect against downgrade) even if signing off" on 4.4.118 commit 0e1b85a41a25ac888fb "SMB: Validate negotiate (to protect against downgrade) even if signing off" on 4.9.84 They correspond to commit 0603c96f3af50e2f9299fa410c224ab1d465e0f9 upstream. Both these patches should revert cleanly. Thank you! Regards, Srivatsa
Re: [PATCH 2/2] block, char_dev: Use correct format specifier for unsigned ints
On 2/6/18 2:24 AM, Greg KH wrote: > On Mon, Feb 05, 2018 at 06:25:27PM -0800, Srivatsa S. Bhat wrote: >> From: Srivatsa S. Bhat <sriva...@csail.mit.edu> >> >> register_blkdev() and __register_chrdev_region() treat the major >> number as an unsigned int. So print it the same way to avoid >> absurd error statements such as: >> "... major requested (-1) is greater than the maximum (511) ..." >> (and also fix off-by-one bugs in the error prints). >> >> While at it, also update the comment describing register_blkdev(). >> >> Signed-off-by: Srivatsa S. Bhat <sriva...@csail.mit.edu> >> --- >> >> block/genhd.c | 19 +++ >> fs/char_dev.c |4 ++-- >> 2 files changed, 13 insertions(+), 10 deletions(-) >> >> diff --git a/block/genhd.c b/block/genhd.c >> index 88a53c1..21a18e5 100644 >> --- a/block/genhd.c >> +++ b/block/genhd.c >> @@ -308,19 +308,22 @@ void blkdev_show(struct seq_file *seqf, off_t offset) >> /** >> * register_blkdev - register a new block device >> * >> - * @major: the requested major device number [1..255]. If @major = 0, try to >> - * allocate any unused major number. >> + * @major: the requested major device number [1..BLKDEV_MAJOR_MAX-1]. If >> + * @major = 0, try to allocate any unused major number. >> * @name: the name of the new block device as a zero terminated string >> * >> * The @name must be unique within the system. >> * >> * The return value depends on the @major input parameter: >> * >> - * - if a major device number was requested in range [1..255] then the >> - *function returns zero on success, or a negative error code >> + * - if a major device number was requested in range >> [1..BLKDEV_MAJOR_MAX-1] >> + *then the function returns zero on success, or a negative error code >> * - if any unused major number was requested with @major = 0 parameter >> *then the return value is the allocated major number in range >> - *[1..255] or a negative error code otherwise >> + *[1..BLKDEV_MAJOR_MAX-1] or a negative error code otherwise >> + * >> + * See Documentation/admin-guide/devices.txt for the list of allocated >> + * major numbers. >> */ >> int register_blkdev(unsigned int major, const char *name) >> { >> @@ -347,8 +350,8 @@ int register_blkdev(unsigned int major, const char *name) >> } >> >> if (major >= BLKDEV_MAJOR_MAX) { >> -pr_err("register_blkdev: major requested (%d) is greater than >> the maximum (%d) for %s\n", >> - major, BLKDEV_MAJOR_MAX, name); >> +pr_err("register_blkdev: major requested (%u) is greater than >> the maximum (%u) for %s\n", >> + major, BLKDEV_MAJOR_MAX-1, name); >> >> ret = -EINVAL; >> goto out; >> @@ -375,7 +378,7 @@ int register_blkdev(unsigned int major, const char *name) >> ret = -EBUSY; >> >> if (ret < 0) { >> -printk("register_blkdev: cannot get major %d for %s\n", >> +printk("register_blkdev: cannot get major %u for %s\n", >> major, name); >> kfree(p); >> } >> diff --git a/fs/char_dev.c b/fs/char_dev.c >> index 33c9385..a279c58 100644 >> --- a/fs/char_dev.c >> +++ b/fs/char_dev.c >> @@ -121,8 +121,8 @@ __register_chrdev_region(unsigned int major, unsigned >> int baseminor, >> } >> >> if (major >= CHRDEV_MAJOR_MAX) { >> -pr_err("CHRDEV \"%s\" major requested (%d) is greater than the >> maximum (%d)\n", >> - name, major, CHRDEV_MAJOR_MAX); >> +pr_err("CHRDEV \"%s\" major requested (%u) is greater than the >> maximum (%u)\n", >> + name, major, CHRDEV_MAJOR_MAX-1); >> ret = -EINVAL; >> goto out; >> } > > Thanks for both of these patches, if Al doesn't grab them, I will after > 4.16-rc1 comes out. > Sounds great! Thank you! Regards, Srivatsa
Re: [PATCH 2/2] block, char_dev: Use correct format specifier for unsigned ints
On 2/6/18 2:24 AM, Greg KH wrote: > On Mon, Feb 05, 2018 at 06:25:27PM -0800, Srivatsa S. Bhat wrote: >> From: Srivatsa S. Bhat >> >> register_blkdev() and __register_chrdev_region() treat the major >> number as an unsigned int. So print it the same way to avoid >> absurd error statements such as: >> "... major requested (-1) is greater than the maximum (511) ..." >> (and also fix off-by-one bugs in the error prints). >> >> While at it, also update the comment describing register_blkdev(). >> >> Signed-off-by: Srivatsa S. Bhat >> --- >> >> block/genhd.c | 19 +++ >> fs/char_dev.c |4 ++-- >> 2 files changed, 13 insertions(+), 10 deletions(-) >> >> diff --git a/block/genhd.c b/block/genhd.c >> index 88a53c1..21a18e5 100644 >> --- a/block/genhd.c >> +++ b/block/genhd.c >> @@ -308,19 +308,22 @@ void blkdev_show(struct seq_file *seqf, off_t offset) >> /** >> * register_blkdev - register a new block device >> * >> - * @major: the requested major device number [1..255]. If @major = 0, try to >> - * allocate any unused major number. >> + * @major: the requested major device number [1..BLKDEV_MAJOR_MAX-1]. If >> + * @major = 0, try to allocate any unused major number. >> * @name: the name of the new block device as a zero terminated string >> * >> * The @name must be unique within the system. >> * >> * The return value depends on the @major input parameter: >> * >> - * - if a major device number was requested in range [1..255] then the >> - *function returns zero on success, or a negative error code >> + * - if a major device number was requested in range >> [1..BLKDEV_MAJOR_MAX-1] >> + *then the function returns zero on success, or a negative error code >> * - if any unused major number was requested with @major = 0 parameter >> *then the return value is the allocated major number in range >> - *[1..255] or a negative error code otherwise >> + *[1..BLKDEV_MAJOR_MAX-1] or a negative error code otherwise >> + * >> + * See Documentation/admin-guide/devices.txt for the list of allocated >> + * major numbers. >> */ >> int register_blkdev(unsigned int major, const char *name) >> { >> @@ -347,8 +350,8 @@ int register_blkdev(unsigned int major, const char *name) >> } >> >> if (major >= BLKDEV_MAJOR_MAX) { >> -pr_err("register_blkdev: major requested (%d) is greater than >> the maximum (%d) for %s\n", >> - major, BLKDEV_MAJOR_MAX, name); >> +pr_err("register_blkdev: major requested (%u) is greater than >> the maximum (%u) for %s\n", >> + major, BLKDEV_MAJOR_MAX-1, name); >> >> ret = -EINVAL; >> goto out; >> @@ -375,7 +378,7 @@ int register_blkdev(unsigned int major, const char *name) >> ret = -EBUSY; >> >> if (ret < 0) { >> -printk("register_blkdev: cannot get major %d for %s\n", >> +printk("register_blkdev: cannot get major %u for %s\n", >> major, name); >> kfree(p); >> } >> diff --git a/fs/char_dev.c b/fs/char_dev.c >> index 33c9385..a279c58 100644 >> --- a/fs/char_dev.c >> +++ b/fs/char_dev.c >> @@ -121,8 +121,8 @@ __register_chrdev_region(unsigned int major, unsigned >> int baseminor, >> } >> >> if (major >= CHRDEV_MAJOR_MAX) { >> -pr_err("CHRDEV \"%s\" major requested (%d) is greater than the >> maximum (%d)\n", >> - name, major, CHRDEV_MAJOR_MAX); >> +pr_err("CHRDEV \"%s\" major requested (%u) is greater than the >> maximum (%u)\n", >> + name, major, CHRDEV_MAJOR_MAX-1); >> ret = -EINVAL; >> goto out; >> } > > Thanks for both of these patches, if Al doesn't grab them, I will after > 4.16-rc1 comes out. > Sounds great! Thank you! Regards, Srivatsa
[PATCH 2/2] block, char_dev: Use correct format specifier for unsigned ints
From: Srivatsa S. Bhat <sriva...@csail.mit.edu> register_blkdev() and __register_chrdev_region() treat the major number as an unsigned int. So print it the same way to avoid absurd error statements such as: "... major requested (-1) is greater than the maximum (511) ..." (and also fix off-by-one bugs in the error prints). While at it, also update the comment describing register_blkdev(). Signed-off-by: Srivatsa S. Bhat <sriva...@csail.mit.edu> --- block/genhd.c | 19 +++ fs/char_dev.c |4 ++-- 2 files changed, 13 insertions(+), 10 deletions(-) diff --git a/block/genhd.c b/block/genhd.c index 88a53c1..21a18e5 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -308,19 +308,22 @@ void blkdev_show(struct seq_file *seqf, off_t offset) /** * register_blkdev - register a new block device * - * @major: the requested major device number [1..255]. If @major = 0, try to - * allocate any unused major number. + * @major: the requested major device number [1..BLKDEV_MAJOR_MAX-1]. If + * @major = 0, try to allocate any unused major number. * @name: the name of the new block device as a zero terminated string * * The @name must be unique within the system. * * The return value depends on the @major input parameter: * - * - if a major device number was requested in range [1..255] then the - *function returns zero on success, or a negative error code + * - if a major device number was requested in range [1..BLKDEV_MAJOR_MAX-1] + *then the function returns zero on success, or a negative error code * - if any unused major number was requested with @major = 0 parameter *then the return value is the allocated major number in range - *[1..255] or a negative error code otherwise + *[1..BLKDEV_MAJOR_MAX-1] or a negative error code otherwise + * + * See Documentation/admin-guide/devices.txt for the list of allocated + * major numbers. */ int register_blkdev(unsigned int major, const char *name) { @@ -347,8 +350,8 @@ int register_blkdev(unsigned int major, const char *name) } if (major >= BLKDEV_MAJOR_MAX) { - pr_err("register_blkdev: major requested (%d) is greater than the maximum (%d) for %s\n", - major, BLKDEV_MAJOR_MAX, name); + pr_err("register_blkdev: major requested (%u) is greater than the maximum (%u) for %s\n", + major, BLKDEV_MAJOR_MAX-1, name); ret = -EINVAL; goto out; @@ -375,7 +378,7 @@ int register_blkdev(unsigned int major, const char *name) ret = -EBUSY; if (ret < 0) { - printk("register_blkdev: cannot get major %d for %s\n", + printk("register_blkdev: cannot get major %u for %s\n", major, name); kfree(p); } diff --git a/fs/char_dev.c b/fs/char_dev.c index 33c9385..a279c58 100644 --- a/fs/char_dev.c +++ b/fs/char_dev.c @@ -121,8 +121,8 @@ __register_chrdev_region(unsigned int major, unsigned int baseminor, } if (major >= CHRDEV_MAJOR_MAX) { - pr_err("CHRDEV \"%s\" major requested (%d) is greater than the maximum (%d)\n", - name, major, CHRDEV_MAJOR_MAX); + pr_err("CHRDEV \"%s\" major requested (%u) is greater than the maximum (%u)\n", + name, major, CHRDEV_MAJOR_MAX-1); ret = -EINVAL; goto out; }
[PATCH 2/2] block, char_dev: Use correct format specifier for unsigned ints
From: Srivatsa S. Bhat register_blkdev() and __register_chrdev_region() treat the major number as an unsigned int. So print it the same way to avoid absurd error statements such as: "... major requested (-1) is greater than the maximum (511) ..." (and also fix off-by-one bugs in the error prints). While at it, also update the comment describing register_blkdev(). Signed-off-by: Srivatsa S. Bhat --- block/genhd.c | 19 +++ fs/char_dev.c |4 ++-- 2 files changed, 13 insertions(+), 10 deletions(-) diff --git a/block/genhd.c b/block/genhd.c index 88a53c1..21a18e5 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -308,19 +308,22 @@ void blkdev_show(struct seq_file *seqf, off_t offset) /** * register_blkdev - register a new block device * - * @major: the requested major device number [1..255]. If @major = 0, try to - * allocate any unused major number. + * @major: the requested major device number [1..BLKDEV_MAJOR_MAX-1]. If + * @major = 0, try to allocate any unused major number. * @name: the name of the new block device as a zero terminated string * * The @name must be unique within the system. * * The return value depends on the @major input parameter: * - * - if a major device number was requested in range [1..255] then the - *function returns zero on success, or a negative error code + * - if a major device number was requested in range [1..BLKDEV_MAJOR_MAX-1] + *then the function returns zero on success, or a negative error code * - if any unused major number was requested with @major = 0 parameter *then the return value is the allocated major number in range - *[1..255] or a negative error code otherwise + *[1..BLKDEV_MAJOR_MAX-1] or a negative error code otherwise + * + * See Documentation/admin-guide/devices.txt for the list of allocated + * major numbers. */ int register_blkdev(unsigned int major, const char *name) { @@ -347,8 +350,8 @@ int register_blkdev(unsigned int major, const char *name) } if (major >= BLKDEV_MAJOR_MAX) { - pr_err("register_blkdev: major requested (%d) is greater than the maximum (%d) for %s\n", - major, BLKDEV_MAJOR_MAX, name); + pr_err("register_blkdev: major requested (%u) is greater than the maximum (%u) for %s\n", + major, BLKDEV_MAJOR_MAX-1, name); ret = -EINVAL; goto out; @@ -375,7 +378,7 @@ int register_blkdev(unsigned int major, const char *name) ret = -EBUSY; if (ret < 0) { - printk("register_blkdev: cannot get major %d for %s\n", + printk("register_blkdev: cannot get major %u for %s\n", major, name); kfree(p); } diff --git a/fs/char_dev.c b/fs/char_dev.c index 33c9385..a279c58 100644 --- a/fs/char_dev.c +++ b/fs/char_dev.c @@ -121,8 +121,8 @@ __register_chrdev_region(unsigned int major, unsigned int baseminor, } if (major >= CHRDEV_MAJOR_MAX) { - pr_err("CHRDEV \"%s\" major requested (%d) is greater than the maximum (%d)\n", - name, major, CHRDEV_MAJOR_MAX); + pr_err("CHRDEV \"%s\" major requested (%u) is greater than the maximum (%u)\n", + name, major, CHRDEV_MAJOR_MAX-1); ret = -EINVAL; goto out; }
[PATCH 1/2] char_dev: Fix off-by-one bugs in find_dynamic_major()
From: Srivatsa S. Bhat <sriva...@csail.mit.edu> CHRDEV_MAJOR_DYN_END and CHRDEV_MAJOR_DYN_EXT_END are valid major numbers. So fix the loop iteration to include them in the search for free major numbers. While at it, also remove a redundant if condition ("cd->major != i"), as it will never be true. Signed-off-by: Srivatsa S. Bhat <sriva...@csail.mit.edu> --- fs/char_dev.c |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/char_dev.c b/fs/char_dev.c index a65e4a5..33c9385 100644 --- a/fs/char_dev.c +++ b/fs/char_dev.c @@ -67,18 +67,18 @@ static int find_dynamic_major(void) int i; struct char_device_struct *cd; - for (i = ARRAY_SIZE(chrdevs)-1; i > CHRDEV_MAJOR_DYN_END; i--) { + for (i = ARRAY_SIZE(chrdevs)-1; i >= CHRDEV_MAJOR_DYN_END; i--) { if (chrdevs[i] == NULL) return i; } for (i = CHRDEV_MAJOR_DYN_EXT_START; -i > CHRDEV_MAJOR_DYN_EXT_END; i--) { +i >= CHRDEV_MAJOR_DYN_EXT_END; i--) { for (cd = chrdevs[major_to_index(i)]; cd; cd = cd->next) if (cd->major == i) break; - if (cd == NULL || cd->major != i) + if (cd == NULL) return i; }
[PATCH 1/2] char_dev: Fix off-by-one bugs in find_dynamic_major()
From: Srivatsa S. Bhat CHRDEV_MAJOR_DYN_END and CHRDEV_MAJOR_DYN_EXT_END are valid major numbers. So fix the loop iteration to include them in the search for free major numbers. While at it, also remove a redundant if condition ("cd->major != i"), as it will never be true. Signed-off-by: Srivatsa S. Bhat --- fs/char_dev.c |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/char_dev.c b/fs/char_dev.c index a65e4a5..33c9385 100644 --- a/fs/char_dev.c +++ b/fs/char_dev.c @@ -67,18 +67,18 @@ static int find_dynamic_major(void) int i; struct char_device_struct *cd; - for (i = ARRAY_SIZE(chrdevs)-1; i > CHRDEV_MAJOR_DYN_END; i--) { + for (i = ARRAY_SIZE(chrdevs)-1; i >= CHRDEV_MAJOR_DYN_END; i--) { if (chrdevs[i] == NULL) return i; } for (i = CHRDEV_MAJOR_DYN_EXT_START; -i > CHRDEV_MAJOR_DYN_EXT_END; i--) { +i >= CHRDEV_MAJOR_DYN_EXT_END; i--) { for (cd = chrdevs[major_to_index(i)]; cd; cd = cd->next) if (cd->major == i) break; - if (cd == NULL || cd->major != i) + if (cd == NULL) return i; }
Re: Change in register_blkdev() behavior
On 2/1/18 5:10 PM, Logan Gunthorpe wrote: > > > On 01/02/18 05:23 PM, Srivatsa S. Bhat wrote: >> static int find_dynamic_major(void) >> { >> int i; >> struct char_device_struct *cd; >> >> for (i = ARRAY_SIZE(chrdevs)-1; i > CHRDEV_MAJOR_DYN_END; i--) { >> >> As far as I can see, _DYN_END is inclusive, so shouldn't this be >= ? > > Yes, it looks like _DYN_END should have been inclusive based on the way I > documented it. > Thank you for confirming! I'll send a patch to fix that (and the analogous case for CHRDEV_MAJOR_DYN_EXT_END). >> >> for (cd = chrdevs[major_to_index(i)]; cd; cd = cd->next) >> if (cd->major == i) >> break; >> >> if (cd == NULL || cd->major != i) >> >> It seems that this latter condition is unnecessary, as it will never be >> true. (We'll reach here only if cd == NULL or cd->major == i). > > Not quite. chrdevs[] may contain majors that also hit on the hash but don't > equal 'i'. So the for loop will iterate through all hashes matching 'i' and > if there is one or more and they all don't match 'i', it will fall through > the loop and cd will be set to something non-null and not equal to i. > Hmm, the code doesn't appear to be doing that though? The loop's fall through occurs one past the last entry, when cd == NULL. The only other way it can exit the loop is if it hits the break statement (which implies that cd->major == i). So what am I missing? Regards, Srivatsa
Re: Change in register_blkdev() behavior
On 2/1/18 5:10 PM, Logan Gunthorpe wrote: > > > On 01/02/18 05:23 PM, Srivatsa S. Bhat wrote: >> static int find_dynamic_major(void) >> { >> int i; >> struct char_device_struct *cd; >> >> for (i = ARRAY_SIZE(chrdevs)-1; i > CHRDEV_MAJOR_DYN_END; i--) { >> >> As far as I can see, _DYN_END is inclusive, so shouldn't this be >= ? > > Yes, it looks like _DYN_END should have been inclusive based on the way I > documented it. > Thank you for confirming! I'll send a patch to fix that (and the analogous case for CHRDEV_MAJOR_DYN_EXT_END). >> >> for (cd = chrdevs[major_to_index(i)]; cd; cd = cd->next) >> if (cd->major == i) >> break; >> >> if (cd == NULL || cd->major != i) >> >> It seems that this latter condition is unnecessary, as it will never be >> true. (We'll reach here only if cd == NULL or cd->major == i). > > Not quite. chrdevs[] may contain majors that also hit on the hash but don't > equal 'i'. So the for loop will iterate through all hashes matching 'i' and > if there is one or more and they all don't match 'i', it will fall through > the loop and cd will be set to something non-null and not equal to i. > Hmm, the code doesn't appear to be doing that though? The loop's fall through occurs one past the last entry, when cd == NULL. The only other way it can exit the loop is if it hits the break statement (which implies that cd->major == i). So what am I missing? Regards, Srivatsa
Re: Change in register_blkdev() behavior
On 1/31/18 6:24 AM, Greg KH wrote: > On Tue, Jan 30, 2018 at 04:56:32PM -0800, Srivatsa S. Bhat wrote: >> >> Hi, >> >> Before commit 133d55cdb2f "block: order /proc/devices by major number", >> if register_blkdev() was called with major = [1..UINT_MAX], it used to >> succeed (provided the requested major number was actually free). > > How was LTP calling register_blkdev() with such crazy numbers? > Haha :-) No idea! > Anyway, I agree with Logan, this sounds like something to be resolved in > LTP, as allowing block devices with numbers greater than the number we > really allow seems like an odd requirement :) > Agreed! And thanks for confirming! Regards, Srivatsa
Re: Change in register_blkdev() behavior
On 1/31/18 6:24 AM, Greg KH wrote: > On Tue, Jan 30, 2018 at 04:56:32PM -0800, Srivatsa S. Bhat wrote: >> >> Hi, >> >> Before commit 133d55cdb2f "block: order /proc/devices by major number", >> if register_blkdev() was called with major = [1..UINT_MAX], it used to >> succeed (provided the requested major number was actually free). > > How was LTP calling register_blkdev() with such crazy numbers? > Haha :-) No idea! > Anyway, I agree with Logan, this sounds like something to be resolved in > LTP, as allowing block devices with numbers greater than the number we > really allow seems like an odd requirement :) > Agreed! And thanks for confirming! Regards, Srivatsa
Re: Change in register_blkdev() behavior
Hi Logan, On 1/30/18 5:26 PM, Logan Gunthorpe wrote: > > > On 30/01/18 05:56 PM, Srivatsa S. Bhat wrote: >> If the restriction on the major number was intentional, perhaps we >> should get the LTP testcase modified for kernel versions >= 4.14. >> Otherwise, we should fix register_blkdev to preserve the old behavior. >> (I guess the same thing applies to commit 8a932f73e5b "char_dev: order >> /proc/devices by major number" as well). > > The restriction was put in place so the code that prints the devices doesn't > have to run through every integer in order to print the devices in order. > > Given the existing documented fixed numbers in [1] and that future new char > devices should be using dynamic allocation, this seemed like a reasonable > restriction. > > It would be pretty trivial to increase the limit but, IMO, setting it to > UINT_MAX seems a bit much. Especially given that a lot of the documentation > and code still very much has remnants of the 256 limit. (The series that > included this patch only just expanded the char dynamic range to above 256). > So, I'd suggest the LTP test should change. > Sounds good! Thank you! By the way, I happened to notice a few minor issues with the find_dynamic_major() function added by this patch series: static int find_dynamic_major(void) { int i; struct char_device_struct *cd; for (i = ARRAY_SIZE(chrdevs)-1; i > CHRDEV_MAJOR_DYN_END; i--) { As far as I can see, _DYN_END is inclusive, so shouldn't this be >= ? if (chrdevs[i] == NULL) return i; } for (i = CHRDEV_MAJOR_DYN_EXT_START; i > CHRDEV_MAJOR_DYN_EXT_END; i--) { Same here; I believe this should be >= for (cd = chrdevs[major_to_index(i)]; cd; cd = cd->next) if (cd->major == i) break; if (cd == NULL || cd->major != i) It seems that this latter condition is unnecessary, as it will never be true. (We'll reach here only if cd == NULL or cd->major == i). return i; } return -EBUSY; } Regards, Srivatsa
Re: Change in register_blkdev() behavior
Hi Logan, On 1/30/18 5:26 PM, Logan Gunthorpe wrote: > > > On 30/01/18 05:56 PM, Srivatsa S. Bhat wrote: >> If the restriction on the major number was intentional, perhaps we >> should get the LTP testcase modified for kernel versions >= 4.14. >> Otherwise, we should fix register_blkdev to preserve the old behavior. >> (I guess the same thing applies to commit 8a932f73e5b "char_dev: order >> /proc/devices by major number" as well). > > The restriction was put in place so the code that prints the devices doesn't > have to run through every integer in order to print the devices in order. > > Given the existing documented fixed numbers in [1] and that future new char > devices should be using dynamic allocation, this seemed like a reasonable > restriction. > > It would be pretty trivial to increase the limit but, IMO, setting it to > UINT_MAX seems a bit much. Especially given that a lot of the documentation > and code still very much has remnants of the 256 limit. (The series that > included this patch only just expanded the char dynamic range to above 256). > So, I'd suggest the LTP test should change. > Sounds good! Thank you! By the way, I happened to notice a few minor issues with the find_dynamic_major() function added by this patch series: static int find_dynamic_major(void) { int i; struct char_device_struct *cd; for (i = ARRAY_SIZE(chrdevs)-1; i > CHRDEV_MAJOR_DYN_END; i--) { As far as I can see, _DYN_END is inclusive, so shouldn't this be >= ? if (chrdevs[i] == NULL) return i; } for (i = CHRDEV_MAJOR_DYN_EXT_START; i > CHRDEV_MAJOR_DYN_EXT_END; i--) { Same here; I believe this should be >= for (cd = chrdevs[major_to_index(i)]; cd; cd = cd->next) if (cd->major == i) break; if (cd == NULL || cd->major != i) It seems that this latter condition is unnecessary, as it will never be true. (We'll reach here only if cd == NULL or cd->major == i). return i; } return -EBUSY; } Regards, Srivatsa
Change in register_blkdev() behavior
Hi, Before commit 133d55cdb2f "block: order /proc/devices by major number", if register_blkdev() was called with major = [1..UINT_MAX], it used to succeed (provided the requested major number was actually free). However, while fixing the ordering in /proc/devices, commit 133d55cdb2f also added this change: @@ -309,6 +309,14 @@ int register_blkdev(unsigned int major, const char *name) ret = major; } + if (major >= BLKDEV_MAJOR_MAX) { + pr_err("register_blkdev: major requested (%d) is greater than the maximum (%d) for %s\n", + major, BLKDEV_MAJOR_MAX, name); + + ret = -EINVAL; + goto out; + } + p = kmalloc(sizeof(struct blk_major_name), GFP_KERNEL); if (p == NULL) { ret = -ENOMEM; So, after this commit, calls to register_blkdev() fail if the requested major number is >= 512 (BLKDEV_MAJOR_MAX). I'm wondering if this was an intentional change or not, as it wasn't explicitly called out in the changelog (and the comment on top of register_blkdev() describing its inputs seems quite out-of-date). This also breaks LTP testcase block_dev/tc05, which tests for edge-cases and expects register_blkdev() to succeed with major=UINT_MAX. If the restriction on the major number was intentional, perhaps we should get the LTP testcase modified for kernel versions >= 4.14. Otherwise, we should fix register_blkdev to preserve the old behavior. (I guess the same thing applies to commit 8a932f73e5b "char_dev: order /proc/devices by major number" as well). Thoughts? Regards, Srivatsa
Change in register_blkdev() behavior
Hi, Before commit 133d55cdb2f "block: order /proc/devices by major number", if register_blkdev() was called with major = [1..UINT_MAX], it used to succeed (provided the requested major number was actually free). However, while fixing the ordering in /proc/devices, commit 133d55cdb2f also added this change: @@ -309,6 +309,14 @@ int register_blkdev(unsigned int major, const char *name) ret = major; } + if (major >= BLKDEV_MAJOR_MAX) { + pr_err("register_blkdev: major requested (%d) is greater than the maximum (%d) for %s\n", + major, BLKDEV_MAJOR_MAX, name); + + ret = -EINVAL; + goto out; + } + p = kmalloc(sizeof(struct blk_major_name), GFP_KERNEL); if (p == NULL) { ret = -ENOMEM; So, after this commit, calls to register_blkdev() fail if the requested major number is >= 512 (BLKDEV_MAJOR_MAX). I'm wondering if this was an intentional change or not, as it wasn't explicitly called out in the changelog (and the comment on top of register_blkdev() describing its inputs seems quite out-of-date). This also breaks LTP testcase block_dev/tc05, which tests for edge-cases and expects register_blkdev() to succeed with major=UINT_MAX. If the restriction on the major number was intentional, perhaps we should get the LTP testcase modified for kernel versions >= 4.14. Otherwise, we should fix register_blkdev to preserve the old behavior. (I guess the same thing applies to commit 8a932f73e5b "char_dev: order /proc/devices by major number" as well). Thoughts? Regards, Srivatsa
Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed
Hi Aurélien, On 1/19/18 5:23 AM, Aurélien Aptel wrote: > Hi, > > "Srivatsa S. Bhat" <sriva...@csail.mit.edu> writes: >>> Any thoughts on what is the right fix for stable kernels? Mounting SMB3 >>> shares works great on mainline (v4.15-rc5). It also works on 4.4.109 if >>> I pass the sec=ntlmsspi option to the mount command (as opposed to the >>> default: sec=ntlmssp). Please let me know if you need any other info. > > Make sure you have (in that order): > > db3b5474f462 ("CIFS: Fix NULL pointer deref on SMB2_tcon() failure") > fe83bebc0522 ("SMB: fix leak of validate negotiate info response buffer") > a2d9daad1d2d ("SMB: fix validate negotiate info uninitialised memory use") > 4587eee04e2a ("SMB3: Validate negotiate request must always be signed") > a821df3f1af7 ("cifs: fix NULL deref in SMB2_read") > > Does enabling CIFS_SMB311 changes anything? > Thank you for looking into this. I tried applying these patches on top of 4.4.113 and 4.9.78, but that didn't fix the problem on either kernel, with or without CONFIG_CIFS_SMB311 enabled. (By the way, shouldn't these patches be applied to stable kernels anyway? I was a bit surprised that none of them are present in 4.4.113 and 4.9.78). > I also suspect some things assume encryption patches are in. > Do you happen to know which patches they might be? In any case, I'm using the latest (unmodified) 4.4 and 4.9 stable kernels, so I hope the necessary support is already present in them. The 5 patches you suggested above needed a bit of fixup by hand for 4.4.113, so I have shared my combined patch below for reference, which applies cleanly on top of 4.4.113. (The same patch applies on 4.9.78 as well, with some minor line-number differences). diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index f2ff60e..92abb8b9 100644 --- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -519,7 +519,7 @@ int smb3_validate_negotiate(const unsigned int xid, struct cifs_tcon *tcon) { int rc = 0; struct validate_negotiate_info_req vneg_inbuf; - struct validate_negotiate_info_rsp *pneg_rsp; + struct validate_negotiate_info_rsp *pneg_rsp = NULL; u32 rsplen; cifs_dbg(FYI, "validate negotiate\n"); @@ -575,8 +575,9 @@ int smb3_validate_negotiate(const unsigned int xid, struct cifs_tcon *tcon) rsplen); /* relax check since Mac returns max bufsize allowed on ioctl */ - if (rsplen > CIFSMaxBufSize) - return -EIO; + if ((rsplen > CIFSMaxBufSize) + || (rsplen < sizeof(struct validate_negotiate_info_rsp))) + goto err_rsp_free; } /* check validate negotiate info response matches what we got earlier */ @@ -595,10 +596,13 @@ int smb3_validate_negotiate(const unsigned int xid, struct cifs_tcon *tcon) /* validate negotiate successful */ cifs_dbg(FYI, "validate negotiate info successful\n"); + kfree(pneg_rsp); return 0; vneg_out: cifs_dbg(VFS, "protocol revalidation - security settings mismatch\n"); +err_rsp_free: + kfree(pneg_rsp); return -EIO; } @@ -1042,7 +1046,7 @@ tcon_exit: return rc; tcon_error_exit: - if (rsp->hdr.Status == STATUS_BAD_NETWORK_NAME) { + if (rsp && rsp->hdr.Status == STATUS_BAD_NETWORK_NAME) { cifs_dbg(VFS, "BAD_NETWORK_NAME: %s\n", tree); } goto tcon_exit; @@ -1559,6 +1563,9 @@ SMB2_ioctl(const unsigned int xid, struct cifs_tcon *tcon, u64 persistent_fid, } else iov[0].iov_len = get_rfc1002_length(req) + 4; + /* validate negotiate request must be signed - see MS-SMB2 3.2.5.5 */ + if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO) + req->hdr.Flags |= SMB2_FLAGS_SIGNED; rc = SendReceive2(xid, ses, iov, num_iovecs, _buftype, 0); rsp = (struct smb2_ioctl_rsp *)iov[0].iov_base; @@ -2159,23 +2166,22 @@ SMB2_read(const unsigned int xid, struct cifs_io_parms *io_parms, rsp = (struct smb2_read_rsp *)iov[0].iov_base; - if (rsp->hdr.Status == STATUS_END_OF_FILE) { + if (rc) { + if (rc != -ENODATA) { + cifs_stats_fail_inc(io_parms->tcon, SMB2_READ_HE); + cifs_dbg(VFS, "Send error in read = %d\n", rc); + } free_rsp_buf(resp_buftype, iov[0].iov_base); - return 0; + return rc == -ENODATA ? 0 : rc; } - if (rc) { - cifs_stats_fail_inc(io_parms->tcon, SMB2_READ_HE); - cifs_dbg(VFS, "Send error in read = %d\n", rc); - } else { - *nbytes = le32_to_cpu(rsp-&
Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed
Hi Aurélien, On 1/19/18 5:23 AM, Aurélien Aptel wrote: > Hi, > > "Srivatsa S. Bhat" writes: >>> Any thoughts on what is the right fix for stable kernels? Mounting SMB3 >>> shares works great on mainline (v4.15-rc5). It also works on 4.4.109 if >>> I pass the sec=ntlmsspi option to the mount command (as opposed to the >>> default: sec=ntlmssp). Please let me know if you need any other info. > > Make sure you have (in that order): > > db3b5474f462 ("CIFS: Fix NULL pointer deref on SMB2_tcon() failure") > fe83bebc0522 ("SMB: fix leak of validate negotiate info response buffer") > a2d9daad1d2d ("SMB: fix validate negotiate info uninitialised memory use") > 4587eee04e2a ("SMB3: Validate negotiate request must always be signed") > a821df3f1af7 ("cifs: fix NULL deref in SMB2_read") > > Does enabling CIFS_SMB311 changes anything? > Thank you for looking into this. I tried applying these patches on top of 4.4.113 and 4.9.78, but that didn't fix the problem on either kernel, with or without CONFIG_CIFS_SMB311 enabled. (By the way, shouldn't these patches be applied to stable kernels anyway? I was a bit surprised that none of them are present in 4.4.113 and 4.9.78). > I also suspect some things assume encryption patches are in. > Do you happen to know which patches they might be? In any case, I'm using the latest (unmodified) 4.4 and 4.9 stable kernels, so I hope the necessary support is already present in them. The 5 patches you suggested above needed a bit of fixup by hand for 4.4.113, so I have shared my combined patch below for reference, which applies cleanly on top of 4.4.113. (The same patch applies on 4.9.78 as well, with some minor line-number differences). diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index f2ff60e..92abb8b9 100644 --- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -519,7 +519,7 @@ int smb3_validate_negotiate(const unsigned int xid, struct cifs_tcon *tcon) { int rc = 0; struct validate_negotiate_info_req vneg_inbuf; - struct validate_negotiate_info_rsp *pneg_rsp; + struct validate_negotiate_info_rsp *pneg_rsp = NULL; u32 rsplen; cifs_dbg(FYI, "validate negotiate\n"); @@ -575,8 +575,9 @@ int smb3_validate_negotiate(const unsigned int xid, struct cifs_tcon *tcon) rsplen); /* relax check since Mac returns max bufsize allowed on ioctl */ - if (rsplen > CIFSMaxBufSize) - return -EIO; + if ((rsplen > CIFSMaxBufSize) + || (rsplen < sizeof(struct validate_negotiate_info_rsp))) + goto err_rsp_free; } /* check validate negotiate info response matches what we got earlier */ @@ -595,10 +596,13 @@ int smb3_validate_negotiate(const unsigned int xid, struct cifs_tcon *tcon) /* validate negotiate successful */ cifs_dbg(FYI, "validate negotiate info successful\n"); + kfree(pneg_rsp); return 0; vneg_out: cifs_dbg(VFS, "protocol revalidation - security settings mismatch\n"); +err_rsp_free: + kfree(pneg_rsp); return -EIO; } @@ -1042,7 +1046,7 @@ tcon_exit: return rc; tcon_error_exit: - if (rsp->hdr.Status == STATUS_BAD_NETWORK_NAME) { + if (rsp && rsp->hdr.Status == STATUS_BAD_NETWORK_NAME) { cifs_dbg(VFS, "BAD_NETWORK_NAME: %s\n", tree); } goto tcon_exit; @@ -1559,6 +1563,9 @@ SMB2_ioctl(const unsigned int xid, struct cifs_tcon *tcon, u64 persistent_fid, } else iov[0].iov_len = get_rfc1002_length(req) + 4; + /* validate negotiate request must be signed - see MS-SMB2 3.2.5.5 */ + if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO) + req->hdr.Flags |= SMB2_FLAGS_SIGNED; rc = SendReceive2(xid, ses, iov, num_iovecs, _buftype, 0); rsp = (struct smb2_ioctl_rsp *)iov[0].iov_base; @@ -2159,23 +2166,22 @@ SMB2_read(const unsigned int xid, struct cifs_io_parms *io_parms, rsp = (struct smb2_read_rsp *)iov[0].iov_base; - if (rsp->hdr.Status == STATUS_END_OF_FILE) { + if (rc) { + if (rc != -ENODATA) { + cifs_stats_fail_inc(io_parms->tcon, SMB2_READ_HE); + cifs_dbg(VFS, "Send error in read = %d\n", rc); + } free_rsp_buf(resp_buftype, iov[0].iov_base); - return 0; + return rc == -ENODATA ? 0 : rc; } - if (rc) { - cifs_stats_fail_inc(io_parms->tcon, SMB2_READ_HE); - cifs_dbg(VFS, "Send error in read = %d\n", rc); - } else { - *nbytes = le32_to_cpu(rsp->DataLength); - if
Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed
On 1/3/18 6:15 PM, Srivatsa S. Bhat wrote: > On 11/1/17 8:18 AM, Greg Kroah-Hartman wrote: >> On Tue, Oct 31, 2017 at 03:02:11PM +0200, Thomas Backlund wrote: >>> Den 31.10.2017 kl. 11:55, skrev Greg Kroah-Hartman: >>>> 4.13-stable review patch. If anyone has any objections, please let me >>>> know. >>>> >>>> -- >>>> >>>> From: Steve French <smfre...@gmail.com> >>>> >>>> commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd upstream. >>>> >>>> According to MS-SMB2 3.2.55 validate_negotiate request must >>>> always be signed. Some Windows can fail the request if you send it unsigned >>>> >>>> See kernel bugzilla bug 197311 >>>> >>>> Acked-by: Ronnie Sahlberg >>>> Signed-off-by: Steve French <smfre...@gmail.com> >>>> Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org> >>>> >>>> --- >>>> fs/cifs/smb2pdu.c |3 +++ >>>> 1 file changed, 3 insertions(+) >>>> >>>> --- a/fs/cifs/smb2pdu.c >>>> +++ b/fs/cifs/smb2pdu.c >>>> @@ -1963,6 +1963,9 @@ SMB2_ioctl(const unsigned int xid, struc >>>>} else >>>>iov[0].iov_len = get_rfc1002_length(req) + 4; >>>> + /* validate negotiate request must be signed - see MS-SMB2 3.2.5.5 */ >>>> + if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO) >>>> + req->hdr.sync_hdr.Flags |= SMB2_FLAGS_SIGNED; >>>>rc = SendReceive2(xid, ses, iov, n_iov, _buftype, flags, _iov); >>>>cifs_small_buf_release(req); >>>> >>>> >>>> >>> >>> This one needs to be backported to all stable kernels as the commit that >>> introduced the regression: >>> ' >>> 0603c96f3af50e2f9299fa410c224ab1d465e0f9 >>> SMB: Validate negotiate (to protect against downgrade) even if signing off >>> >>> is backported in stable trees as of: 4.9.53, 4.4.90, 3.18.73 >> >> Oh wait, it breaks the builds on older kernels, that's why I didn't >> apply it :) >> >> Can you provide me with a working backport? >> > > Hi Steve, > > Is there a version of this fix available for stable kernels? > Any thoughts on this? Regards, Srivatsa > I tried applying this patch to 4.4.109 (and a similar one for 4.9.74), > but it didn't fix the problem. Instead, I actually got a NULL pointer > dereference when I tried to mount an SMB3 share. > > Here is the patch I tried on 4.4.109: > > diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c > index f2ff60e..3963bd2 100644 > --- a/fs/cifs/smb2pdu.c > +++ b/fs/cifs/smb2pdu.c > @@ -1559,6 +1559,9 @@ SMB2_ioctl(const unsigned int xid, struct cifs_tcon > *tcon, u64 persistent_fid, > } else > iov[0].iov_len = get_rfc1002_length(req) + 4; > > + /* validate negotiate request must be signed - see MS-SMB2 3.2.5.5 */ > + if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO) > + req->hdr.Flags |= SMB2_FLAGS_SIGNED; > > rc = SendReceive2(xid, ses, iov, num_iovecs, _buftype, 0); > rsp = (struct smb2_ioctl_rsp *)iov[0].iov_base; > > > This results in the following NULL pointer dereference when I try > mounting: > > # mount -vvv -t cifs -o vers=3.0,credentials=.smbcred ///TestSMB/ > testdir > > [ 53.073057] BUG: unable to handle kernel NULL pointer dereference at > 0050 > [ 53.073511] IP: [] crypto_shash_setkey+0x1a/0xc0 > [ 53.073973] PGD 0 > [ 53.074427] Oops: [#1] SMP > [ 53.074946] Modules linked in: arc4(E) ecb(E) md4(E) cifs(E) > dns_resolver(E) vmw_vsock_vmci_transport(E) vsock(E) hid_generic(E) usbhid(E) > hid(E) xt_conntrack(E) mousedev(E) iptable_nat(E) nf_conntrack_ipv4(E) > nf_defrag_ipv4(E) nf_nat_ipv4(E) nf_nat(E) iptable_filter(E) ip_tables(E) > crc32c_intel(E) xt_LOG(E) nf_conntrack(E) jitterentropy_rng(E) hmac(E) > sha256_ssse3(E) sha256_generic(E) drbg(E) vmw_balloon(E) ansi_cprng(E) > aesni_intel(E) aes_x86_64(E) glue_helper(E) lrw(E) gf128mul(E) ablk_helper(E) > cryptd(E) psmouse(E) evdev(E) uhci_hcd(E) ehci_pci(E) ehci_hcd(E) usbcore(E) > intel_agp(E) usb_common(E) vmw_vmci(E) i2c_piix4(E) intel_gtt(E) nfit(E) > battery(E) tpm_tis(E) tpm(E) ac(E) button(E) sch_fq_codel(E) autofs4(E) > [ 53.079435] CPU: 3 PID: 829 Comm: mount.cifs Tainted: GE > 4.4.109-possible-fix1+ #21 > [ 53.079983] Hardware name: VMware, Inc. VMware Virtual Platform/440BX > Desktop Reference Platform, BIOS 6.00 04/05/2016 >
Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed
On 1/3/18 6:15 PM, Srivatsa S. Bhat wrote: > On 11/1/17 8:18 AM, Greg Kroah-Hartman wrote: >> On Tue, Oct 31, 2017 at 03:02:11PM +0200, Thomas Backlund wrote: >>> Den 31.10.2017 kl. 11:55, skrev Greg Kroah-Hartman: >>>> 4.13-stable review patch. If anyone has any objections, please let me >>>> know. >>>> >>>> -- >>>> >>>> From: Steve French >>>> >>>> commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd upstream. >>>> >>>> According to MS-SMB2 3.2.55 validate_negotiate request must >>>> always be signed. Some Windows can fail the request if you send it unsigned >>>> >>>> See kernel bugzilla bug 197311 >>>> >>>> Acked-by: Ronnie Sahlberg >>>> Signed-off-by: Steve French >>>> Signed-off-by: Greg Kroah-Hartman >>>> >>>> --- >>>> fs/cifs/smb2pdu.c |3 +++ >>>> 1 file changed, 3 insertions(+) >>>> >>>> --- a/fs/cifs/smb2pdu.c >>>> +++ b/fs/cifs/smb2pdu.c >>>> @@ -1963,6 +1963,9 @@ SMB2_ioctl(const unsigned int xid, struc >>>>} else >>>>iov[0].iov_len = get_rfc1002_length(req) + 4; >>>> + /* validate negotiate request must be signed - see MS-SMB2 3.2.5.5 */ >>>> + if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO) >>>> + req->hdr.sync_hdr.Flags |= SMB2_FLAGS_SIGNED; >>>>rc = SendReceive2(xid, ses, iov, n_iov, _buftype, flags, _iov); >>>>cifs_small_buf_release(req); >>>> >>>> >>>> >>> >>> This one needs to be backported to all stable kernels as the commit that >>> introduced the regression: >>> ' >>> 0603c96f3af50e2f9299fa410c224ab1d465e0f9 >>> SMB: Validate negotiate (to protect against downgrade) even if signing off >>> >>> is backported in stable trees as of: 4.9.53, 4.4.90, 3.18.73 >> >> Oh wait, it breaks the builds on older kernels, that's why I didn't >> apply it :) >> >> Can you provide me with a working backport? >> > > Hi Steve, > > Is there a version of this fix available for stable kernels? > Any thoughts on this? Regards, Srivatsa > I tried applying this patch to 4.4.109 (and a similar one for 4.9.74), > but it didn't fix the problem. Instead, I actually got a NULL pointer > dereference when I tried to mount an SMB3 share. > > Here is the patch I tried on 4.4.109: > > diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c > index f2ff60e..3963bd2 100644 > --- a/fs/cifs/smb2pdu.c > +++ b/fs/cifs/smb2pdu.c > @@ -1559,6 +1559,9 @@ SMB2_ioctl(const unsigned int xid, struct cifs_tcon > *tcon, u64 persistent_fid, > } else > iov[0].iov_len = get_rfc1002_length(req) + 4; > > + /* validate negotiate request must be signed - see MS-SMB2 3.2.5.5 */ > + if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO) > + req->hdr.Flags |= SMB2_FLAGS_SIGNED; > > rc = SendReceive2(xid, ses, iov, num_iovecs, _buftype, 0); > rsp = (struct smb2_ioctl_rsp *)iov[0].iov_base; > > > This results in the following NULL pointer dereference when I try > mounting: > > # mount -vvv -t cifs -o vers=3.0,credentials=.smbcred ///TestSMB/ > testdir > > [ 53.073057] BUG: unable to handle kernel NULL pointer dereference at > 0050 > [ 53.073511] IP: [] crypto_shash_setkey+0x1a/0xc0 > [ 53.073973] PGD 0 > [ 53.074427] Oops: [#1] SMP > [ 53.074946] Modules linked in: arc4(E) ecb(E) md4(E) cifs(E) > dns_resolver(E) vmw_vsock_vmci_transport(E) vsock(E) hid_generic(E) usbhid(E) > hid(E) xt_conntrack(E) mousedev(E) iptable_nat(E) nf_conntrack_ipv4(E) > nf_defrag_ipv4(E) nf_nat_ipv4(E) nf_nat(E) iptable_filter(E) ip_tables(E) > crc32c_intel(E) xt_LOG(E) nf_conntrack(E) jitterentropy_rng(E) hmac(E) > sha256_ssse3(E) sha256_generic(E) drbg(E) vmw_balloon(E) ansi_cprng(E) > aesni_intel(E) aes_x86_64(E) glue_helper(E) lrw(E) gf128mul(E) ablk_helper(E) > cryptd(E) psmouse(E) evdev(E) uhci_hcd(E) ehci_pci(E) ehci_hcd(E) usbcore(E) > intel_agp(E) usb_common(E) vmw_vmci(E) i2c_piix4(E) intel_gtt(E) nfit(E) > battery(E) tpm_tis(E) tpm(E) ac(E) button(E) sch_fq_codel(E) autofs4(E) > [ 53.079435] CPU: 3 PID: 829 Comm: mount.cifs Tainted: GE > 4.4.109-possible-fix1+ #21 > [ 53.079983] Hardware name: VMware, Inc. VMware Virtual Platform/440BX > Desktop Reference Platform, BIOS 6.00 04/05/2016 > [ 53.081086] task: 8800b4f41940 ti: 8800b92ac000 task.ti: > 8800b92ac0
Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed
On 11/1/17 8:18 AM, Greg Kroah-Hartman wrote: > On Tue, Oct 31, 2017 at 03:02:11PM +0200, Thomas Backlund wrote: >> Den 31.10.2017 kl. 11:55, skrev Greg Kroah-Hartman: >>> 4.13-stable review patch. If anyone has any objections, please let me know. >>> >>> -- >>> >>> From: Steve French>>> >>> commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd upstream. >>> >>> According to MS-SMB2 3.2.55 validate_negotiate request must >>> always be signed. Some Windows can fail the request if you send it unsigned >>> >>> See kernel bugzilla bug 197311 >>> >>> Acked-by: Ronnie Sahlberg >>> Signed-off-by: Steve French >>> Signed-off-by: Greg Kroah-Hartman >>> >>> --- >>> fs/cifs/smb2pdu.c |3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> --- a/fs/cifs/smb2pdu.c >>> +++ b/fs/cifs/smb2pdu.c >>> @@ -1963,6 +1963,9 @@ SMB2_ioctl(const unsigned int xid, struc >>> } else >>> iov[0].iov_len = get_rfc1002_length(req) + 4; >>> + /* validate negotiate request must be signed - see MS-SMB2 3.2.5.5 */ >>> + if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO) >>> + req->hdr.sync_hdr.Flags |= SMB2_FLAGS_SIGNED; >>> rc = SendReceive2(xid, ses, iov, n_iov, _buftype, flags, _iov); >>> cifs_small_buf_release(req); >>> >>> >>> >> >> This one needs to be backported to all stable kernels as the commit that >> introduced the regression: >> ' >> 0603c96f3af50e2f9299fa410c224ab1d465e0f9 >> SMB: Validate negotiate (to protect against downgrade) even if signing off >> >> is backported in stable trees as of: 4.9.53, 4.4.90, 3.18.73 > > Oh wait, it breaks the builds on older kernels, that's why I didn't > apply it :) > > Can you provide me with a working backport? > Hi Steve, Is there a version of this fix available for stable kernels? I tried applying this patch to 4.4.109 (and a similar one for 4.9.74), but it didn't fix the problem. Instead, I actually got a NULL pointer dereference when I tried to mount an SMB3 share. Here is the patch I tried on 4.4.109: diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index f2ff60e..3963bd2 100644 --- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -1559,6 +1559,9 @@ SMB2_ioctl(const unsigned int xid, struct cifs_tcon *tcon, u64 persistent_fid, } else iov[0].iov_len = get_rfc1002_length(req) + 4; + /* validate negotiate request must be signed - see MS-SMB2 3.2.5.5 */ + if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO) + req->hdr.Flags |= SMB2_FLAGS_SIGNED; rc = SendReceive2(xid, ses, iov, num_iovecs, _buftype, 0); rsp = (struct smb2_ioctl_rsp *)iov[0].iov_base; This results in the following NULL pointer dereference when I try mounting: # mount -vvv -t cifs -o vers=3.0,credentials=.smbcred ///TestSMB/ testdir [ 53.073057] BUG: unable to handle kernel NULL pointer dereference at 0050 [ 53.073511] IP: [] crypto_shash_setkey+0x1a/0xc0 [ 53.073973] PGD 0 [ 53.074427] Oops: [#1] SMP [ 53.074946] Modules linked in: arc4(E) ecb(E) md4(E) cifs(E) dns_resolver(E) vmw_vsock_vmci_transport(E) vsock(E) hid_generic(E) usbhid(E) hid(E) xt_conntrack(E) mousedev(E) iptable_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) nf_nat(E) iptable_filter(E) ip_tables(E) crc32c_intel(E) xt_LOG(E) nf_conntrack(E) jitterentropy_rng(E) hmac(E) sha256_ssse3(E) sha256_generic(E) drbg(E) vmw_balloon(E) ansi_cprng(E) aesni_intel(E) aes_x86_64(E) glue_helper(E) lrw(E) gf128mul(E) ablk_helper(E) cryptd(E) psmouse(E) evdev(E) uhci_hcd(E) ehci_pci(E) ehci_hcd(E) usbcore(E) intel_agp(E) usb_common(E) vmw_vmci(E) i2c_piix4(E) intel_gtt(E) nfit(E) battery(E) tpm_tis(E) tpm(E) ac(E) button(E) sch_fq_codel(E) autofs4(E) [ 53.079435] CPU: 3 PID: 829 Comm: mount.cifs Tainted: GE 4.4.109-possible-fix1+ #21 [ 53.079983] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016 [ 53.081086] task: 8800b4f41940 ti: 8800b92ac000 task.ti: 8800b92ac000 [ 53.081667] RIP: 0010:[] [] crypto_shash_setkey+0x1a/0xc0 [ 53.082247] RSP: 0018:8800b92af9a8 EFLAGS: 00010282 [ 53.082604] systemd-journald[284]: Compressed data object 721 -> 468 using XZ [ 53.083419] RAX: 8800af5943c0 RBX: 8800b484a800 RCX: 0ec7 [ 53.084001] RDX: 0010 RSI: 8800b900af18 RDI: [ 53.084602] RBP: 8800b92af9e0 R08: 8800b92afb64 R09: [ 53.085184] R10: 3031322e3030312e R11: 07f5 R12: 0002 [ 53.085755] R13: R14: 8800b900af18 R15: 0010 [ 53.086333] FS: 7fb659b45740() GS:88013fcc() knlGS: [ 53.086907] CS: 0010 DS: ES: CR0: 80050033 [ 53.087480] CR2: 0050 CR3: b797 CR4: 001606e0 [ 53.088107] Stack: [ 53.088681]
Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed
On 11/1/17 8:18 AM, Greg Kroah-Hartman wrote: > On Tue, Oct 31, 2017 at 03:02:11PM +0200, Thomas Backlund wrote: >> Den 31.10.2017 kl. 11:55, skrev Greg Kroah-Hartman: >>> 4.13-stable review patch. If anyone has any objections, please let me know. >>> >>> -- >>> >>> From: Steve French >>> >>> commit 4587eee04e2ac7ac3ac9fa2bc164fb6e548f99cd upstream. >>> >>> According to MS-SMB2 3.2.55 validate_negotiate request must >>> always be signed. Some Windows can fail the request if you send it unsigned >>> >>> See kernel bugzilla bug 197311 >>> >>> Acked-by: Ronnie Sahlberg >>> Signed-off-by: Steve French >>> Signed-off-by: Greg Kroah-Hartman >>> >>> --- >>> fs/cifs/smb2pdu.c |3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> --- a/fs/cifs/smb2pdu.c >>> +++ b/fs/cifs/smb2pdu.c >>> @@ -1963,6 +1963,9 @@ SMB2_ioctl(const unsigned int xid, struc >>> } else >>> iov[0].iov_len = get_rfc1002_length(req) + 4; >>> + /* validate negotiate request must be signed - see MS-SMB2 3.2.5.5 */ >>> + if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO) >>> + req->hdr.sync_hdr.Flags |= SMB2_FLAGS_SIGNED; >>> rc = SendReceive2(xid, ses, iov, n_iov, _buftype, flags, _iov); >>> cifs_small_buf_release(req); >>> >>> >>> >> >> This one needs to be backported to all stable kernels as the commit that >> introduced the regression: >> ' >> 0603c96f3af50e2f9299fa410c224ab1d465e0f9 >> SMB: Validate negotiate (to protect against downgrade) even if signing off >> >> is backported in stable trees as of: 4.9.53, 4.4.90, 3.18.73 > > Oh wait, it breaks the builds on older kernels, that's why I didn't > apply it :) > > Can you provide me with a working backport? > Hi Steve, Is there a version of this fix available for stable kernels? I tried applying this patch to 4.4.109 (and a similar one for 4.9.74), but it didn't fix the problem. Instead, I actually got a NULL pointer dereference when I tried to mount an SMB3 share. Here is the patch I tried on 4.4.109: diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index f2ff60e..3963bd2 100644 --- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -1559,6 +1559,9 @@ SMB2_ioctl(const unsigned int xid, struct cifs_tcon *tcon, u64 persistent_fid, } else iov[0].iov_len = get_rfc1002_length(req) + 4; + /* validate negotiate request must be signed - see MS-SMB2 3.2.5.5 */ + if (opcode == FSCTL_VALIDATE_NEGOTIATE_INFO) + req->hdr.Flags |= SMB2_FLAGS_SIGNED; rc = SendReceive2(xid, ses, iov, num_iovecs, _buftype, 0); rsp = (struct smb2_ioctl_rsp *)iov[0].iov_base; This results in the following NULL pointer dereference when I try mounting: # mount -vvv -t cifs -o vers=3.0,credentials=.smbcred ///TestSMB/ testdir [ 53.073057] BUG: unable to handle kernel NULL pointer dereference at 0050 [ 53.073511] IP: [] crypto_shash_setkey+0x1a/0xc0 [ 53.073973] PGD 0 [ 53.074427] Oops: [#1] SMP [ 53.074946] Modules linked in: arc4(E) ecb(E) md4(E) cifs(E) dns_resolver(E) vmw_vsock_vmci_transport(E) vsock(E) hid_generic(E) usbhid(E) hid(E) xt_conntrack(E) mousedev(E) iptable_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) nf_nat(E) iptable_filter(E) ip_tables(E) crc32c_intel(E) xt_LOG(E) nf_conntrack(E) jitterentropy_rng(E) hmac(E) sha256_ssse3(E) sha256_generic(E) drbg(E) vmw_balloon(E) ansi_cprng(E) aesni_intel(E) aes_x86_64(E) glue_helper(E) lrw(E) gf128mul(E) ablk_helper(E) cryptd(E) psmouse(E) evdev(E) uhci_hcd(E) ehci_pci(E) ehci_hcd(E) usbcore(E) intel_agp(E) usb_common(E) vmw_vmci(E) i2c_piix4(E) intel_gtt(E) nfit(E) battery(E) tpm_tis(E) tpm(E) ac(E) button(E) sch_fq_codel(E) autofs4(E) [ 53.079435] CPU: 3 PID: 829 Comm: mount.cifs Tainted: GE 4.4.109-possible-fix1+ #21 [ 53.079983] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016 [ 53.081086] task: 8800b4f41940 ti: 8800b92ac000 task.ti: 8800b92ac000 [ 53.081667] RIP: 0010:[] [] crypto_shash_setkey+0x1a/0xc0 [ 53.082247] RSP: 0018:8800b92af9a8 EFLAGS: 00010282 [ 53.082604] systemd-journald[284]: Compressed data object 721 -> 468 using XZ [ 53.083419] RAX: 8800af5943c0 RBX: 8800b484a800 RCX: 0ec7 [ 53.084001] RDX: 0010 RSI: 8800b900af18 RDI: [ 53.084602] RBP: 8800b92af9e0 R08: 8800b92afb64 R09: [ 53.085184] R10: 3031322e3030312e R11: 07f5 R12: 0002 [ 53.085755] R13: R14: 8800b900af18 R15: 0010 [ 53.086333] FS: 7fb659b45740() GS:88013fcc() knlGS: [ 53.086907] CS: 0010 DS: ES: CR0: 80050033 [ 53.087480] CR2: 0050 CR3: b797 CR4: 001606e0 [ 53.088107] Stack: [ 53.088681] 8800bba5d8c0 8800b92afa08 8800b484a800 0002 [
Re: [tip:smp/hotplug] cpu/hotplug: Restructure FROZEN state handling
On 3/1/16 2:51 PM, tip-bot for Thomas Gleixner wrote: > Commit-ID: 090e77c391dd983c8945b8e2e16d09f378d2e334 > Gitweb: http://git.kernel.org/tip/090e77c391dd983c8945b8e2e16d09f378d2e334 > Author: Thomas Gleixner <t...@linutronix.de> > AuthorDate: Fri, 26 Feb 2016 18:43:23 + > Committer: Thomas Gleixner <t...@linutronix.de> > CommitDate: Tue, 1 Mar 2016 20:36:53 +0100 > > cpu/hotplug: Restructure FROZEN state handling > > There are only a few callbacks which really care about FROZEN > vs. !FROZEN. No need to have extra states for this. > > Publish the frozen state in an extra variable which is updated under > the hotplug lock and let the users interested deal with it w/o > imposing that extra state checks on everyone. > > Signed-off-by: Thomas Gleixner <t...@linutronix.de> > Cc: linux-a...@vger.kernel.org > Cc: Rik van Riel <r...@redhat.com> > Cc: Rafael Wysocki <rafael.j.wyso...@intel.com> > Cc: "Srivatsa S. Bhat" <sriva...@mit.edu> > Cc: Peter Zijlstra <pet...@infradead.org> > Cc: Arjan van de Ven <ar...@linux.intel.com> > Cc: Sebastian Siewior <bige...@linutronix.de> > Cc: Rusty Russell <ru...@rustcorp.com.au> > Cc: Steven Rostedt <rost...@goodmis.org> > Cc: Oleg Nesterov <o...@redhat.com> > Cc: Tejun Heo <t...@kernel.org> > Cc: Andrew Morton <a...@linux-foundation.org> > Cc: Paul McKenney <paul...@linux.vnet.ibm.com> > Cc: Linus Torvalds <torva...@linux-foundation.org> > Cc: Paul Turner <p...@google.com> > Link: http://lkml.kernel.org/r/20160226182340.334912...@linutronix.de > Signed-off-by: Thomas Gleixner <t...@linutronix.de> > --- > include/linux/cpu.h | 2 ++ > kernel/cpu.c| 69 > ++--- > 2 files changed, 31 insertions(+), 40 deletions(-) > > diff --git a/include/linux/cpu.h b/include/linux/cpu.h > index d2ca8c3..f2fb549 100644 > --- a/include/linux/cpu.h > +++ b/include/linux/cpu.h > @@ -118,6 +118,7 @@ enum { > > > #ifdef CONFIG_SMP > +extern bool cpuhp_tasks_frozen; > /* Need to know about CPUs going up/down? */ > #if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE) > #define cpu_notifier(fn, pri) { \ > @@ -177,6 +178,7 @@ extern void cpu_maps_update_done(void); > #define cpu_notifier_register_done cpu_maps_update_done > > #else/* CONFIG_SMP */ > +#define cpuhp_tasks_frozen 0 > > #define cpu_notifier(fn, pri)do { (void)(fn); } while (0) > #define __cpu_notifier(fn, pri) do { (void)(fn); } while (0) > diff --git a/kernel/cpu.c b/kernel/cpu.c > index 5b9d396..41a6cb8 100644 > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@ -29,6 +29,8 @@ > #ifdef CONFIG_SMP > /* Serializes the updates to cpu_online_mask, cpu_present_mask */ > static DEFINE_MUTEX(cpu_add_remove_lock); > +bool cpuhp_tasks_frozen; > +EXPORT_SYMBOL_GPL(cpuhp_tasks_frozen); > One small nitpick though: we don't need to export this symbol yet; it can be deferred until the callbacks that need it are actually modified to use this value (presumably in a later patchset). Regards, Srivatsa S. Bhat
Re: [tip:smp/hotplug] cpu/hotplug: Restructure FROZEN state handling
On 3/1/16 2:51 PM, tip-bot for Thomas Gleixner wrote: > Commit-ID: 090e77c391dd983c8945b8e2e16d09f378d2e334 > Gitweb: http://git.kernel.org/tip/090e77c391dd983c8945b8e2e16d09f378d2e334 > Author: Thomas Gleixner > AuthorDate: Fri, 26 Feb 2016 18:43:23 + > Committer: Thomas Gleixner > CommitDate: Tue, 1 Mar 2016 20:36:53 +0100 > > cpu/hotplug: Restructure FROZEN state handling > > There are only a few callbacks which really care about FROZEN > vs. !FROZEN. No need to have extra states for this. > > Publish the frozen state in an extra variable which is updated under > the hotplug lock and let the users interested deal with it w/o > imposing that extra state checks on everyone. > > Signed-off-by: Thomas Gleixner > Cc: linux-a...@vger.kernel.org > Cc: Rik van Riel > Cc: Rafael Wysocki > Cc: "Srivatsa S. Bhat" > Cc: Peter Zijlstra > Cc: Arjan van de Ven > Cc: Sebastian Siewior > Cc: Rusty Russell > Cc: Steven Rostedt > Cc: Oleg Nesterov > Cc: Tejun Heo > Cc: Andrew Morton > Cc: Paul McKenney > Cc: Linus Torvalds > Cc: Paul Turner > Link: http://lkml.kernel.org/r/20160226182340.334912...@linutronix.de > Signed-off-by: Thomas Gleixner > --- > include/linux/cpu.h | 2 ++ > kernel/cpu.c| 69 > ++--- > 2 files changed, 31 insertions(+), 40 deletions(-) > > diff --git a/include/linux/cpu.h b/include/linux/cpu.h > index d2ca8c3..f2fb549 100644 > --- a/include/linux/cpu.h > +++ b/include/linux/cpu.h > @@ -118,6 +118,7 @@ enum { > > > #ifdef CONFIG_SMP > +extern bool cpuhp_tasks_frozen; > /* Need to know about CPUs going up/down? */ > #if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE) > #define cpu_notifier(fn, pri) { \ > @@ -177,6 +178,7 @@ extern void cpu_maps_update_done(void); > #define cpu_notifier_register_done cpu_maps_update_done > > #else/* CONFIG_SMP */ > +#define cpuhp_tasks_frozen 0 > > #define cpu_notifier(fn, pri)do { (void)(fn); } while (0) > #define __cpu_notifier(fn, pri) do { (void)(fn); } while (0) > diff --git a/kernel/cpu.c b/kernel/cpu.c > index 5b9d396..41a6cb8 100644 > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@ -29,6 +29,8 @@ > #ifdef CONFIG_SMP > /* Serializes the updates to cpu_online_mask, cpu_present_mask */ > static DEFINE_MUTEX(cpu_add_remove_lock); > +bool cpuhp_tasks_frozen; > +EXPORT_SYMBOL_GPL(cpuhp_tasks_frozen); > One small nitpick though: we don't need to export this symbol yet; it can be deferred until the callbacks that need it are actually modified to use this value (presumably in a later patchset). Regards, Srivatsa S. Bhat
Re: [tip:smp/hotplug] cpu/hotplug: Restructure FROZEN state handling
On 3/1/16 2:51 PM, tip-bot for Thomas Gleixner wrote: > Commit-ID: 090e77c391dd983c8945b8e2e16d09f378d2e334 > Gitweb: http://git.kernel.org/tip/090e77c391dd983c8945b8e2e16d09f378d2e334 > Author: Thomas Gleixner <t...@linutronix.de> > AuthorDate: Fri, 26 Feb 2016 18:43:23 + > Committer: Thomas Gleixner <t...@linutronix.de> > CommitDate: Tue, 1 Mar 2016 20:36:53 +0100 > > cpu/hotplug: Restructure FROZEN state handling > > There are only a few callbacks which really care about FROZEN > vs. !FROZEN. No need to have extra states for this. > > Publish the frozen state in an extra variable which is updated under > the hotplug lock and let the users interested deal with it w/o > imposing that extra state checks on everyone. > > Signed-off-by: Thomas Gleixner <t...@linutronix.de> > Cc: linux-a...@vger.kernel.org > Cc: Rik van Riel <r...@redhat.com> > Cc: Rafael Wysocki <rafael.j.wyso...@intel.com> > Cc: "Srivatsa S. Bhat" <sriva...@mit.edu> > Cc: Peter Zijlstra <pet...@infradead.org> > Cc: Arjan van de Ven <ar...@linux.intel.com> > Cc: Sebastian Siewior <bige...@linutronix.de> > Cc: Rusty Russell <ru...@rustcorp.com.au> > Cc: Steven Rostedt <rost...@goodmis.org> > Cc: Oleg Nesterov <o...@redhat.com> > Cc: Tejun Heo <t...@kernel.org> > Cc: Andrew Morton <a...@linux-foundation.org> > Cc: Paul McKenney <paul...@linux.vnet.ibm.com> > Cc: Linus Torvalds <torva...@linux-foundation.org> > Cc: Paul Turner <p...@google.com> > Link: http://lkml.kernel.org/r/20160226182340.334912...@linutronix.de > Signed-off-by: Thomas Gleixner <t...@linutronix.de> > --- Reviewed-by: Srivatsa S. Bhat <sriva...@csail.mit.edu> Regards, Srivatsa S. Bhat > include/linux/cpu.h | 2 ++ > kernel/cpu.c| 69 > ++--- > 2 files changed, 31 insertions(+), 40 deletions(-) > > diff --git a/include/linux/cpu.h b/include/linux/cpu.h > index d2ca8c3..f2fb549 100644 > --- a/include/linux/cpu.h > +++ b/include/linux/cpu.h > @@ -118,6 +118,7 @@ enum { > > > #ifdef CONFIG_SMP > +extern bool cpuhp_tasks_frozen; > /* Need to know about CPUs going up/down? */ > #if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE) > #define cpu_notifier(fn, pri) { \ > @@ -177,6 +178,7 @@ extern void cpu_maps_update_done(void); > #define cpu_notifier_register_done cpu_maps_update_done > > #else/* CONFIG_SMP */ > +#define cpuhp_tasks_frozen 0 > > #define cpu_notifier(fn, pri)do { (void)(fn); } while (0) > #define __cpu_notifier(fn, pri) do { (void)(fn); } while (0) > diff --git a/kernel/cpu.c b/kernel/cpu.c > index 5b9d396..41a6cb8 100644 > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@ -29,6 +29,8 @@ > #ifdef CONFIG_SMP > /* Serializes the updates to cpu_online_mask, cpu_present_mask */ > static DEFINE_MUTEX(cpu_add_remove_lock); > +bool cpuhp_tasks_frozen; > +EXPORT_SYMBOL_GPL(cpuhp_tasks_frozen); > > /* > * The following two APIs (cpu_maps_update_begin/done) must be used when > @@ -207,27 +209,30 @@ int __register_cpu_notifier(struct notifier_block *nb) > return raw_notifier_chain_register(_chain, nb); > } > > -static int __cpu_notify(unsigned long val, void *v, int nr_to_call, > +static int __cpu_notify(unsigned long val, unsigned int cpu, int nr_to_call, > int *nr_calls) > { > + unsigned long mod = cpuhp_tasks_frozen ? CPU_TASKS_FROZEN : 0; > + void *hcpu = (void *)(long)cpu; > + > int ret; > > - ret = __raw_notifier_call_chain(_chain, val, v, nr_to_call, > + ret = __raw_notifier_call_chain(_chain, val | mod, hcpu, nr_to_call, > nr_calls); > > return notifier_to_errno(ret); > } > > -static int cpu_notify(unsigned long val, void *v) > +static int cpu_notify(unsigned long val, unsigned int cpu) > { > - return __cpu_notify(val, v, -1, NULL); > + return __cpu_notify(val, cpu, -1, NULL); > } > > #ifdef CONFIG_HOTPLUG_CPU > > -static void cpu_notify_nofail(unsigned long val, void *v) > +static void cpu_notify_nofail(unsigned long val, unsigned int cpu) > { > - BUG_ON(cpu_notify(val, v)); > + BUG_ON(cpu_notify(val, cpu)); > } > EXPORT_SYMBOL(register_cpu_notifier); > EXPORT_SYMBOL(__register_cpu_notifier); > @@ -311,27 +316,21 @@ static inline void check_for_tasks(int dead_cpu) > read_unlock(_lock); > } > > -struct take_cpu_down_param { > - unsigned long mod; > - void *hcpu; > -}
Re: [tip:smp/hotplug] cpu/hotplug: Restructure FROZEN state handling
On 3/1/16 2:51 PM, tip-bot for Thomas Gleixner wrote: > Commit-ID: 090e77c391dd983c8945b8e2e16d09f378d2e334 > Gitweb: http://git.kernel.org/tip/090e77c391dd983c8945b8e2e16d09f378d2e334 > Author: Thomas Gleixner > AuthorDate: Fri, 26 Feb 2016 18:43:23 + > Committer: Thomas Gleixner > CommitDate: Tue, 1 Mar 2016 20:36:53 +0100 > > cpu/hotplug: Restructure FROZEN state handling > > There are only a few callbacks which really care about FROZEN > vs. !FROZEN. No need to have extra states for this. > > Publish the frozen state in an extra variable which is updated under > the hotplug lock and let the users interested deal with it w/o > imposing that extra state checks on everyone. > > Signed-off-by: Thomas Gleixner > Cc: linux-a...@vger.kernel.org > Cc: Rik van Riel > Cc: Rafael Wysocki > Cc: "Srivatsa S. Bhat" > Cc: Peter Zijlstra > Cc: Arjan van de Ven > Cc: Sebastian Siewior > Cc: Rusty Russell > Cc: Steven Rostedt > Cc: Oleg Nesterov > Cc: Tejun Heo > Cc: Andrew Morton > Cc: Paul McKenney > Cc: Linus Torvalds > Cc: Paul Turner > Link: http://lkml.kernel.org/r/20160226182340.334912...@linutronix.de > Signed-off-by: Thomas Gleixner > --- Reviewed-by: Srivatsa S. Bhat Regards, Srivatsa S. Bhat > include/linux/cpu.h | 2 ++ > kernel/cpu.c| 69 > ++--- > 2 files changed, 31 insertions(+), 40 deletions(-) > > diff --git a/include/linux/cpu.h b/include/linux/cpu.h > index d2ca8c3..f2fb549 100644 > --- a/include/linux/cpu.h > +++ b/include/linux/cpu.h > @@ -118,6 +118,7 @@ enum { > > > #ifdef CONFIG_SMP > +extern bool cpuhp_tasks_frozen; > /* Need to know about CPUs going up/down? */ > #if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE) > #define cpu_notifier(fn, pri) { \ > @@ -177,6 +178,7 @@ extern void cpu_maps_update_done(void); > #define cpu_notifier_register_done cpu_maps_update_done > > #else/* CONFIG_SMP */ > +#define cpuhp_tasks_frozen 0 > > #define cpu_notifier(fn, pri)do { (void)(fn); } while (0) > #define __cpu_notifier(fn, pri) do { (void)(fn); } while (0) > diff --git a/kernel/cpu.c b/kernel/cpu.c > index 5b9d396..41a6cb8 100644 > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@ -29,6 +29,8 @@ > #ifdef CONFIG_SMP > /* Serializes the updates to cpu_online_mask, cpu_present_mask */ > static DEFINE_MUTEX(cpu_add_remove_lock); > +bool cpuhp_tasks_frozen; > +EXPORT_SYMBOL_GPL(cpuhp_tasks_frozen); > > /* > * The following two APIs (cpu_maps_update_begin/done) must be used when > @@ -207,27 +209,30 @@ int __register_cpu_notifier(struct notifier_block *nb) > return raw_notifier_chain_register(_chain, nb); > } > > -static int __cpu_notify(unsigned long val, void *v, int nr_to_call, > +static int __cpu_notify(unsigned long val, unsigned int cpu, int nr_to_call, > int *nr_calls) > { > + unsigned long mod = cpuhp_tasks_frozen ? CPU_TASKS_FROZEN : 0; > + void *hcpu = (void *)(long)cpu; > + > int ret; > > - ret = __raw_notifier_call_chain(_chain, val, v, nr_to_call, > + ret = __raw_notifier_call_chain(_chain, val | mod, hcpu, nr_to_call, > nr_calls); > > return notifier_to_errno(ret); > } > > -static int cpu_notify(unsigned long val, void *v) > +static int cpu_notify(unsigned long val, unsigned int cpu) > { > - return __cpu_notify(val, v, -1, NULL); > + return __cpu_notify(val, cpu, -1, NULL); > } > > #ifdef CONFIG_HOTPLUG_CPU > > -static void cpu_notify_nofail(unsigned long val, void *v) > +static void cpu_notify_nofail(unsigned long val, unsigned int cpu) > { > - BUG_ON(cpu_notify(val, v)); > + BUG_ON(cpu_notify(val, cpu)); > } > EXPORT_SYMBOL(register_cpu_notifier); > EXPORT_SYMBOL(__register_cpu_notifier); > @@ -311,27 +316,21 @@ static inline void check_for_tasks(int dead_cpu) > read_unlock(_lock); > } > > -struct take_cpu_down_param { > - unsigned long mod; > - void *hcpu; > -}; > - > /* Take this CPU down. */ > static int take_cpu_down(void *_param) > { > - struct take_cpu_down_param *param = _param; > - int err; > + int err, cpu = smp_processor_id(); > > /* Ensure this CPU doesn't handle any more interrupts. */ > err = __cpu_disable(); > if (err < 0) > return err; > > - cpu_notify(CPU_DYING | param->mod, param->hcpu); > + cpu_notify(CPU_DYING, cpu); > /* Give up timekeeping duties */ > tick_h
Re: [tip:smp/hotplug] cpu/hotplug: Split out cpu down functions
On 3/1/16 2:52 PM, tip-bot for Thomas Gleixner wrote: > Commit-ID: 984581728eb4b2e10baed3d606f85a119795b207 > Gitweb: http://git.kernel.org/tip/984581728eb4b2e10baed3d606f85a119795b207 > Author: Thomas Gleixner <t...@linutronix.de> > AuthorDate: Fri, 26 Feb 2016 18:43:25 + > Committer: Thomas Gleixner <t...@linutronix.de> > CommitDate: Tue, 1 Mar 2016 20:36:53 +0100 > > cpu/hotplug: Split out cpu down functions > > Split cpu_down in separate functions in preparation for state machine > conversion. > > Signed-off-by: Thomas Gleixner <t...@linutronix.de> > Cc: linux-a...@vger.kernel.org > Cc: Rik van Riel <r...@redhat.com> > Cc: Rafael Wysocki <rafael.j.wyso...@intel.com> > Cc: "Srivatsa S. Bhat" <sriva...@mit.edu> > Cc: Peter Zijlstra <pet...@infradead.org> > Cc: Arjan van de Ven <ar...@linux.intel.com> > Cc: Sebastian Siewior <bige...@linutronix.de> > Cc: Rusty Russell <ru...@rustcorp.com.au> > Cc: Steven Rostedt <rost...@goodmis.org> > Cc: Oleg Nesterov <o...@redhat.com> > Cc: Tejun Heo <t...@kernel.org> > Cc: Andrew Morton <a...@linux-foundation.org> > Cc: Paul McKenney <paul...@linux.vnet.ibm.com> > Cc: Linus Torvalds <torva...@linux-foundation.org> > Cc: Paul Turner <p...@google.com> > Link: http://lkml.kernel.org/r/20160226182340.511796...@linutronix.de > Signed-off-by: Thomas Gleixner <t...@linutronix.de> > --- Reviewed-by: Srivatsa S. Bhat <sriva...@csail.mit.edu> Regards, Srivatsa S. Bhat > kernel/cpu.c | 83 > ++-- > 1 file changed, 53 insertions(+), 30 deletions(-) > > diff --git a/kernel/cpu.c b/kernel/cpu.c > index 15a4136..0b5d259 100644 > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@ -266,11 +266,6 @@ static int bringup_cpu(unsigned int cpu) > } > > #ifdef CONFIG_HOTPLUG_CPU > - > -static void cpu_notify_nofail(unsigned long val, unsigned int cpu) > -{ > - BUG_ON(cpu_notify(val, cpu)); > -} > EXPORT_SYMBOL(register_cpu_notifier); > EXPORT_SYMBOL(__register_cpu_notifier); > > @@ -353,6 +348,25 @@ static inline void check_for_tasks(int dead_cpu) > read_unlock(_lock); > } > > +static void cpu_notify_nofail(unsigned long val, unsigned int cpu) > +{ > + BUG_ON(cpu_notify(val, cpu)); > +} > + > +static int notify_down_prepare(unsigned int cpu) > +{ > + int err, nr_calls = 0; > + > + err = __cpu_notify(CPU_DOWN_PREPARE, cpu, -1, _calls); > + if (err) { > + nr_calls--; > + __cpu_notify(CPU_DOWN_FAILED, cpu, nr_calls, NULL); > + pr_warn("%s: attempt to take down CPU %u failed\n", > + __func__, cpu); > + } > + return err; > +} > + > /* Take this CPU down. */ > static int take_cpu_down(void *_param) > { > @@ -371,29 +385,9 @@ static int take_cpu_down(void *_param) > return 0; > } > > -/* Requires cpu_add_remove_lock to be held */ > -static int _cpu_down(unsigned int cpu, int tasks_frozen) > +static int takedown_cpu(unsigned int cpu) > { > - int err, nr_calls = 0; > - > - if (num_online_cpus() == 1) > - return -EBUSY; > - > - if (!cpu_online(cpu)) > - return -EINVAL; > - > - cpu_hotplug_begin(); > - > - cpuhp_tasks_frozen = tasks_frozen; > - > - err = __cpu_notify(CPU_DOWN_PREPARE, cpu, -1, _calls); > - if (err) { > - nr_calls--; > - __cpu_notify(CPU_DOWN_FAILED, cpu, nr_calls, NULL); > - pr_warn("%s: attempt to take down CPU %u failed\n", > - __func__, cpu); > - goto out_release; > - } > + int err; > > /* >* By now we've cleared cpu_active_mask, wait for all preempt-disabled > @@ -426,7 +420,7 @@ static int _cpu_down(unsigned int cpu, int tasks_frozen) > /* CPU didn't die: tell everyone. Can't complain. */ > cpu_notify_nofail(CPU_DOWN_FAILED, cpu); > irq_unlock_sparse(); > - goto out_release; > + return err; > } > BUG_ON(cpu_online(cpu)); > > @@ -449,11 +443,40 @@ static int _cpu_down(unsigned int cpu, int tasks_frozen) > /* This actually kills the CPU. */ > __cpu_die(cpu); > > - /* CPU is completely dead: tell everyone. Too late to complain. */ > tick_cleanup_dead_cpu(cpu); > - cpu_notify_nofail(CPU_DEAD, cpu); > + return 0; > +} > > +static int notify_dead(unsigned int cpu) > +{ > + cpu_notify_nofail(CPU_DEAD, cpu);
Re: [tip:smp/hotplug] cpu/hotplug: Split out cpu down functions
On 3/1/16 2:52 PM, tip-bot for Thomas Gleixner wrote: > Commit-ID: 984581728eb4b2e10baed3d606f85a119795b207 > Gitweb: http://git.kernel.org/tip/984581728eb4b2e10baed3d606f85a119795b207 > Author: Thomas Gleixner > AuthorDate: Fri, 26 Feb 2016 18:43:25 + > Committer: Thomas Gleixner > CommitDate: Tue, 1 Mar 2016 20:36:53 +0100 > > cpu/hotplug: Split out cpu down functions > > Split cpu_down in separate functions in preparation for state machine > conversion. > > Signed-off-by: Thomas Gleixner > Cc: linux-a...@vger.kernel.org > Cc: Rik van Riel > Cc: Rafael Wysocki > Cc: "Srivatsa S. Bhat" > Cc: Peter Zijlstra > Cc: Arjan van de Ven > Cc: Sebastian Siewior > Cc: Rusty Russell > Cc: Steven Rostedt > Cc: Oleg Nesterov > Cc: Tejun Heo > Cc: Andrew Morton > Cc: Paul McKenney > Cc: Linus Torvalds > Cc: Paul Turner > Link: http://lkml.kernel.org/r/20160226182340.511796...@linutronix.de > Signed-off-by: Thomas Gleixner > --- Reviewed-by: Srivatsa S. Bhat Regards, Srivatsa S. Bhat > kernel/cpu.c | 83 > ++-- > 1 file changed, 53 insertions(+), 30 deletions(-) > > diff --git a/kernel/cpu.c b/kernel/cpu.c > index 15a4136..0b5d259 100644 > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@ -266,11 +266,6 @@ static int bringup_cpu(unsigned int cpu) > } > > #ifdef CONFIG_HOTPLUG_CPU > - > -static void cpu_notify_nofail(unsigned long val, unsigned int cpu) > -{ > - BUG_ON(cpu_notify(val, cpu)); > -} > EXPORT_SYMBOL(register_cpu_notifier); > EXPORT_SYMBOL(__register_cpu_notifier); > > @@ -353,6 +348,25 @@ static inline void check_for_tasks(int dead_cpu) > read_unlock(_lock); > } > > +static void cpu_notify_nofail(unsigned long val, unsigned int cpu) > +{ > + BUG_ON(cpu_notify(val, cpu)); > +} > + > +static int notify_down_prepare(unsigned int cpu) > +{ > + int err, nr_calls = 0; > + > + err = __cpu_notify(CPU_DOWN_PREPARE, cpu, -1, _calls); > + if (err) { > + nr_calls--; > + __cpu_notify(CPU_DOWN_FAILED, cpu, nr_calls, NULL); > + pr_warn("%s: attempt to take down CPU %u failed\n", > + __func__, cpu); > + } > + return err; > +} > + > /* Take this CPU down. */ > static int take_cpu_down(void *_param) > { > @@ -371,29 +385,9 @@ static int take_cpu_down(void *_param) > return 0; > } > > -/* Requires cpu_add_remove_lock to be held */ > -static int _cpu_down(unsigned int cpu, int tasks_frozen) > +static int takedown_cpu(unsigned int cpu) > { > - int err, nr_calls = 0; > - > - if (num_online_cpus() == 1) > - return -EBUSY; > - > - if (!cpu_online(cpu)) > - return -EINVAL; > - > - cpu_hotplug_begin(); > - > - cpuhp_tasks_frozen = tasks_frozen; > - > - err = __cpu_notify(CPU_DOWN_PREPARE, cpu, -1, _calls); > - if (err) { > - nr_calls--; > - __cpu_notify(CPU_DOWN_FAILED, cpu, nr_calls, NULL); > - pr_warn("%s: attempt to take down CPU %u failed\n", > - __func__, cpu); > - goto out_release; > - } > + int err; > > /* >* By now we've cleared cpu_active_mask, wait for all preempt-disabled > @@ -426,7 +420,7 @@ static int _cpu_down(unsigned int cpu, int tasks_frozen) > /* CPU didn't die: tell everyone. Can't complain. */ > cpu_notify_nofail(CPU_DOWN_FAILED, cpu); > irq_unlock_sparse(); > - goto out_release; > + return err; > } > BUG_ON(cpu_online(cpu)); > > @@ -449,11 +443,40 @@ static int _cpu_down(unsigned int cpu, int tasks_frozen) > /* This actually kills the CPU. */ > __cpu_die(cpu); > > - /* CPU is completely dead: tell everyone. Too late to complain. */ > tick_cleanup_dead_cpu(cpu); > - cpu_notify_nofail(CPU_DEAD, cpu); > + return 0; > +} > > +static int notify_dead(unsigned int cpu) > +{ > + cpu_notify_nofail(CPU_DEAD, cpu); > check_for_tasks(cpu); > + return 0; > +} > + > +/* Requires cpu_add_remove_lock to be held */ > +static int __ref _cpu_down(unsigned int cpu, int tasks_frozen) > +{ > + int err; > + > + if (num_online_cpus() == 1) > + return -EBUSY; > + > + if (!cpu_online(cpu)) > + return -EINVAL; > + > + cpu_hotplug_begin(); > + > + cpuhp_tasks_frozen = tasks_frozen; > + > + err = notify_down_prepare(cpu); > + if (err) > + goto out_release; > + err = takedown_cpu(cpu); > + if (err) > + goto out_release; > + > + notify_dead(cpu); > > out_release: > cpu_hotplug_done(); >
Re: [tip:smp/hotplug] cpu/hotplug: Restructure cpu_up code
On 3/1/16 2:52 PM, tip-bot for Thomas Gleixner wrote: > Commit-ID: ba997462435f48ad1501320e9da8770fd40c59b1 > Gitweb: http://git.kernel.org/tip/ba997462435f48ad1501320e9da8770fd40c59b1 > Author: Thomas Gleixner <t...@linutronix.de> > AuthorDate: Fri, 26 Feb 2016 18:43:24 + > Committer: Thomas Gleixner <t...@linutronix.de> > CommitDate: Tue, 1 Mar 2016 20:36:53 +0100 > > cpu/hotplug: Restructure cpu_up code > > Split out into separate functions, so we can convert it to a state machine. > > Signed-off-by: Thomas Gleixner <t...@linutronix.de> > Cc: linux-a...@vger.kernel.org > Cc: Rik van Riel <r...@redhat.com> > Cc: Rafael Wysocki <rafael.j.wyso...@intel.com> > Cc: "Srivatsa S. Bhat" <sriva...@mit.edu> > Cc: Peter Zijlstra <pet...@infradead.org> > Cc: Arjan van de Ven <ar...@linux.intel.com> > Cc: Sebastian Siewior <bige...@linutronix.de> > Cc: Rusty Russell <ru...@rustcorp.com.au> > Cc: Steven Rostedt <rost...@goodmis.org> > Cc: Oleg Nesterov <o...@redhat.com> > Cc: Tejun Heo <t...@kernel.org> > Cc: Andrew Morton <a...@linux-foundation.org> > Cc: Paul McKenney <paul...@linux.vnet.ibm.com> > Cc: Linus Torvalds <torva...@linux-foundation.org> > Cc: Paul Turner <p...@google.com> > Link: http://lkml.kernel.org/r/20160226182340.429389...@linutronix.de > Signed-off-by: Thomas Gleixner <t...@linutronix.de> > --- Reviewed-by: Srivatsa S. Bhat <sriva...@csail.mit.edu> Regards, Srivatsa S. Bhat > kernel/cpu.c | 69 > +--- > 1 file changed, 47 insertions(+), 22 deletions(-) > > diff --git a/kernel/cpu.c b/kernel/cpu.c > index 41a6cb8..15a4136 100644 > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@ -228,6 +228,43 @@ static int cpu_notify(unsigned long val, unsigned int > cpu) > return __cpu_notify(val, cpu, -1, NULL); > } > > +/* Notifier wrappers for transitioning to state machine */ > +static int notify_prepare(unsigned int cpu) > +{ > + int nr_calls = 0; > + int ret; > + > + ret = __cpu_notify(CPU_UP_PREPARE, cpu, -1, _calls); > + if (ret) { > + nr_calls--; > + printk(KERN_WARNING "%s: attempt to bring up CPU %u failed\n", > + __func__, cpu); > + __cpu_notify(CPU_UP_CANCELED, cpu, nr_calls, NULL); > + } > + return ret; > +} > + > +static int notify_online(unsigned int cpu) > +{ > + cpu_notify(CPU_ONLINE, cpu); > + return 0; > +} > + > +static int bringup_cpu(unsigned int cpu) > +{ > + struct task_struct *idle = idle_thread_get(cpu); > + int ret; > + > + /* Arch-specific enabling code. */ > + ret = __cpu_up(cpu, idle); > + if (ret) { > + cpu_notify(CPU_UP_CANCELED, cpu); > + return ret; > + } > + BUG_ON(!cpu_online(cpu)); > + return 0; > +} > + > #ifdef CONFIG_HOTPLUG_CPU > > static void cpu_notify_nofail(unsigned long val, unsigned int cpu) > @@ -481,7 +518,7 @@ void smpboot_thread_init(void) > static int _cpu_up(unsigned int cpu, int tasks_frozen) > { > struct task_struct *idle; > - int ret, nr_calls = 0; > + int ret; > > cpu_hotplug_begin(); > > @@ -496,33 +533,21 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen) > goto out; > } > > + cpuhp_tasks_frozen = tasks_frozen; > + > ret = smpboot_create_threads(cpu); > if (ret) > goto out; > > - cpuhp_tasks_frozen = tasks_frozen; > - > - ret = __cpu_notify(CPU_UP_PREPARE, cpu, -1, _calls); > - if (ret) { > - nr_calls--; > - pr_warn("%s: attempt to bring up CPU %u failed\n", > - __func__, cpu); > - goto out_notify; > - } > - > - /* Arch-specific enabling code. */ > - ret = __cpu_up(cpu, idle); > - > - if (ret != 0) > - goto out_notify; > - BUG_ON(!cpu_online(cpu)); > + ret = notify_prepare(cpu); > + if (ret) > + goto out; > > - /* Now call notifier in preparation. */ > - cpu_notify(CPU_ONLINE, cpu); > + ret = bringup_cpu(cpu); > + if (ret) > + goto out; > > -out_notify: > - if (ret != 0) > - __cpu_notify(CPU_UP_CANCELED, cpu, nr_calls, NULL); > + notify_online(cpu); > out: > cpu_hotplug_done(); > >
Re: [tip:smp/hotplug] cpu/hotplug: Restructure cpu_up code
On 3/1/16 2:52 PM, tip-bot for Thomas Gleixner wrote: > Commit-ID: ba997462435f48ad1501320e9da8770fd40c59b1 > Gitweb: http://git.kernel.org/tip/ba997462435f48ad1501320e9da8770fd40c59b1 > Author: Thomas Gleixner > AuthorDate: Fri, 26 Feb 2016 18:43:24 + > Committer: Thomas Gleixner > CommitDate: Tue, 1 Mar 2016 20:36:53 +0100 > > cpu/hotplug: Restructure cpu_up code > > Split out into separate functions, so we can convert it to a state machine. > > Signed-off-by: Thomas Gleixner > Cc: linux-a...@vger.kernel.org > Cc: Rik van Riel > Cc: Rafael Wysocki > Cc: "Srivatsa S. Bhat" > Cc: Peter Zijlstra > Cc: Arjan van de Ven > Cc: Sebastian Siewior > Cc: Rusty Russell > Cc: Steven Rostedt > Cc: Oleg Nesterov > Cc: Tejun Heo > Cc: Andrew Morton > Cc: Paul McKenney > Cc: Linus Torvalds > Cc: Paul Turner > Link: http://lkml.kernel.org/r/20160226182340.429389...@linutronix.de > Signed-off-by: Thomas Gleixner > --- Reviewed-by: Srivatsa S. Bhat Regards, Srivatsa S. Bhat > kernel/cpu.c | 69 > +--- > 1 file changed, 47 insertions(+), 22 deletions(-) > > diff --git a/kernel/cpu.c b/kernel/cpu.c > index 41a6cb8..15a4136 100644 > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@ -228,6 +228,43 @@ static int cpu_notify(unsigned long val, unsigned int > cpu) > return __cpu_notify(val, cpu, -1, NULL); > } > > +/* Notifier wrappers for transitioning to state machine */ > +static int notify_prepare(unsigned int cpu) > +{ > + int nr_calls = 0; > + int ret; > + > + ret = __cpu_notify(CPU_UP_PREPARE, cpu, -1, _calls); > + if (ret) { > + nr_calls--; > + printk(KERN_WARNING "%s: attempt to bring up CPU %u failed\n", > + __func__, cpu); > + __cpu_notify(CPU_UP_CANCELED, cpu, nr_calls, NULL); > + } > + return ret; > +} > + > +static int notify_online(unsigned int cpu) > +{ > + cpu_notify(CPU_ONLINE, cpu); > + return 0; > +} > + > +static int bringup_cpu(unsigned int cpu) > +{ > + struct task_struct *idle = idle_thread_get(cpu); > + int ret; > + > + /* Arch-specific enabling code. */ > + ret = __cpu_up(cpu, idle); > + if (ret) { > + cpu_notify(CPU_UP_CANCELED, cpu); > + return ret; > + } > + BUG_ON(!cpu_online(cpu)); > + return 0; > +} > + > #ifdef CONFIG_HOTPLUG_CPU > > static void cpu_notify_nofail(unsigned long val, unsigned int cpu) > @@ -481,7 +518,7 @@ void smpboot_thread_init(void) > static int _cpu_up(unsigned int cpu, int tasks_frozen) > { > struct task_struct *idle; > - int ret, nr_calls = 0; > + int ret; > > cpu_hotplug_begin(); > > @@ -496,33 +533,21 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen) > goto out; > } > > + cpuhp_tasks_frozen = tasks_frozen; > + > ret = smpboot_create_threads(cpu); > if (ret) > goto out; > > - cpuhp_tasks_frozen = tasks_frozen; > - > - ret = __cpu_notify(CPU_UP_PREPARE, cpu, -1, _calls); > - if (ret) { > - nr_calls--; > - pr_warn("%s: attempt to bring up CPU %u failed\n", > - __func__, cpu); > - goto out_notify; > - } > - > - /* Arch-specific enabling code. */ > - ret = __cpu_up(cpu, idle); > - > - if (ret != 0) > - goto out_notify; > - BUG_ON(!cpu_online(cpu)); > + ret = notify_prepare(cpu); > + if (ret) > + goto out; > > - /* Now call notifier in preparation. */ > - cpu_notify(CPU_ONLINE, cpu); > + ret = bringup_cpu(cpu); > + if (ret) > + goto out; > > -out_notify: > - if (ret != 0) > - __cpu_notify(CPU_UP_CANCELED, cpu, nr_calls, NULL); > + notify_online(cpu); > out: > cpu_hotplug_done(); > >
Re: [tip:sched/core] irq_work: Remove BUG_ON in irq_work_run()
Hi Ingo, On 07/05/2014 04:13 PM, tip-bot for Peter Zijlstra wrote: > Commit-ID: a77353e5eb56b6c6098bfce59aff1f449451b0b7 > Gitweb: http://git.kernel.org/tip/a77353e5eb56b6c6098bfce59aff1f449451b0b7 > Author: Peter Zijlstra > AuthorDate: Wed, 25 Jun 2014 07:13:07 +0200 > Committer: Ingo Molnar > CommitDate: Sat, 5 Jul 2014 11:17:26 +0200 > > irq_work: Remove BUG_ON in irq_work_run() > I believe this fix has to go into 3.16 itself, since this fixes a CPU hotplug regression on many systems, as reported here: https://lkml.org/lkml/2014/6/24/765 https://lkml.org/lkml/2014/7/1/473 https://lkml.org/lkml/2014/7/4/16 I didn't find this fix in mainline yet, so I thought of sending a note. Thank you! Regards, Srivatsa S. Bhat > Because of a collision with 8d056c48e486 ("CPU hotplug, smp: flush any > pending IPI callbacks before CPU offline"), which ends up calling > hotplug_cfd()->flush_smp_call_function_queue()->irq_work_run(), which > is not from IRQ context. > > And since that already calls irq_work_run() from the hotplug path, > remove our entire hotplug handling. > > Reported-by: Stephen Warren > Tested-by: Stephen Warren > Reviewed-by: Srivatsa S. Bhat > Cc: Frederic Weisbecker > Cc: Linus Torvalds > Signed-off-by: Peter Zijlstra > Link: http://lkml.kernel.org/n/tip-busatzs2gvz4v62258agi...@git.kernel.org > Signed-off-by: Ingo Molnar > --- > kernel/irq_work.c | 46 -- > 1 file changed, 4 insertions(+), 42 deletions(-) > > diff --git a/kernel/irq_work.c b/kernel/irq_work.c > index 4b0a890..e6bcbe7 100644 > --- a/kernel/irq_work.c > +++ b/kernel/irq_work.c > @@ -160,20 +160,14 @@ static void irq_work_run_list(struct llist_head *list) > } > } > > -static void __irq_work_run(void) > -{ > - irq_work_run_list(&__get_cpu_var(raised_list)); > - irq_work_run_list(&__get_cpu_var(lazy_list)); > -} > - > /* > - * Run the irq_work entries on this cpu. Requires to be ran from hardirq > - * context with local IRQs disabled. > + * hotplug calls this through: > + * hotplug_cfd() -> flush_smp_call_function_queue() > */ > void irq_work_run(void) > { > - BUG_ON(!in_irq()); > - __irq_work_run(); > + irq_work_run_list(&__get_cpu_var(raised_list)); > + irq_work_run_list(&__get_cpu_var(lazy_list)); > } > EXPORT_SYMBOL_GPL(irq_work_run); > > @@ -189,35 +183,3 @@ void irq_work_sync(struct irq_work *work) > cpu_relax(); > } > EXPORT_SYMBOL_GPL(irq_work_sync); > - > -#ifdef CONFIG_HOTPLUG_CPU > -static int irq_work_cpu_notify(struct notifier_block *self, > -unsigned long action, void *hcpu) > -{ > - long cpu = (long)hcpu; > - > - switch (action) { > - case CPU_DYING: > - /* Called from stop_machine */ > - if (WARN_ON_ONCE(cpu != smp_processor_id())) > - break; > - __irq_work_run(); > - break; > - default: > - break; > - } > - return NOTIFY_OK; > -} > - > -static struct notifier_block cpu_notify; > - > -static __init int irq_work_init_cpu_notifier(void) > -{ > - cpu_notify.notifier_call = irq_work_cpu_notify; > - cpu_notify.priority = 0; > - register_cpu_notifier(_notify); > - return 0; > -} > -device_initcall(irq_work_init_cpu_notifier); > - > -#endif /* CONFIG_HOTPLUG_CPU */ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [tip:sched/core] irq_work: Remove BUG_ON in irq_work_run()
Hi Ingo, On 07/05/2014 04:13 PM, tip-bot for Peter Zijlstra wrote: Commit-ID: a77353e5eb56b6c6098bfce59aff1f449451b0b7 Gitweb: http://git.kernel.org/tip/a77353e5eb56b6c6098bfce59aff1f449451b0b7 Author: Peter Zijlstra pet...@infradead.org AuthorDate: Wed, 25 Jun 2014 07:13:07 +0200 Committer: Ingo Molnar mi...@kernel.org CommitDate: Sat, 5 Jul 2014 11:17:26 +0200 irq_work: Remove BUG_ON in irq_work_run() I believe this fix has to go into 3.16 itself, since this fixes a CPU hotplug regression on many systems, as reported here: https://lkml.org/lkml/2014/6/24/765 https://lkml.org/lkml/2014/7/1/473 https://lkml.org/lkml/2014/7/4/16 I didn't find this fix in mainline yet, so I thought of sending a note. Thank you! Regards, Srivatsa S. Bhat Because of a collision with 8d056c48e486 (CPU hotplug, smp: flush any pending IPI callbacks before CPU offline), which ends up calling hotplug_cfd()-flush_smp_call_function_queue()-irq_work_run(), which is not from IRQ context. And since that already calls irq_work_run() from the hotplug path, remove our entire hotplug handling. Reported-by: Stephen Warren swar...@wwwdotorg.org Tested-by: Stephen Warren swar...@wwwdotorg.org Reviewed-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com Cc: Frederic Weisbecker fweis...@gmail.com Cc: Linus Torvalds torva...@linux-foundation.org Signed-off-by: Peter Zijlstra pet...@infradead.org Link: http://lkml.kernel.org/n/tip-busatzs2gvz4v62258agi...@git.kernel.org Signed-off-by: Ingo Molnar mi...@kernel.org --- kernel/irq_work.c | 46 -- 1 file changed, 4 insertions(+), 42 deletions(-) diff --git a/kernel/irq_work.c b/kernel/irq_work.c index 4b0a890..e6bcbe7 100644 --- a/kernel/irq_work.c +++ b/kernel/irq_work.c @@ -160,20 +160,14 @@ static void irq_work_run_list(struct llist_head *list) } } -static void __irq_work_run(void) -{ - irq_work_run_list(__get_cpu_var(raised_list)); - irq_work_run_list(__get_cpu_var(lazy_list)); -} - /* - * Run the irq_work entries on this cpu. Requires to be ran from hardirq - * context with local IRQs disabled. + * hotplug calls this through: + * hotplug_cfd() - flush_smp_call_function_queue() */ void irq_work_run(void) { - BUG_ON(!in_irq()); - __irq_work_run(); + irq_work_run_list(__get_cpu_var(raised_list)); + irq_work_run_list(__get_cpu_var(lazy_list)); } EXPORT_SYMBOL_GPL(irq_work_run); @@ -189,35 +183,3 @@ void irq_work_sync(struct irq_work *work) cpu_relax(); } EXPORT_SYMBOL_GPL(irq_work_sync); - -#ifdef CONFIG_HOTPLUG_CPU -static int irq_work_cpu_notify(struct notifier_block *self, -unsigned long action, void *hcpu) -{ - long cpu = (long)hcpu; - - switch (action) { - case CPU_DYING: - /* Called from stop_machine */ - if (WARN_ON_ONCE(cpu != smp_processor_id())) - break; - __irq_work_run(); - break; - default: - break; - } - return NOTIFY_OK; -} - -static struct notifier_block cpu_notify; - -static __init int irq_work_init_cpu_notifier(void) -{ - cpu_notify.notifier_call = irq_work_cpu_notify; - cpu_notify.priority = 0; - register_cpu_notifier(cpu_notify); - return 0; -} -device_initcall(irq_work_init_cpu_notifier); - -#endif /* CONFIG_HOTPLUG_CPU */ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 1/2] cpufreq: Don't destroy/realloc policy/sysfs on hotplug/suspend
On 07/16/2014 06:43 PM, Viresh Kumar wrote: > On 16 July 2014 16:46, Srivatsa S. Bhat wrote: >> Short answer: If the sysfs directory has already been created by cpufreq, >> then yes, it will remain as it is. However, if the online operation failed >> before that, then cpufreq won't know about that CPU at all, and no file will >> be created. >> >> Long answer: >> The existing cpufreq code does all its work (including creating the sysfs >> directories etc) at the CPU_ONLINE stage. This stage is not expected to fail >> (in fact even the core CPU hotplug code in kernel/cpu.c doesn't care for >> error returns at this point). So if a CPU fails to come up in earlier stages >> itself (such as CPU_UP_PREPARE), then cpufreq won't even hear about that CPU, >> and hence no sysfs files will be created/linked. However, if the CPU bringup >> operation fails during the CPU_ONLINE stage after the cpufreq's notifier has >> been invoked, then we do nothing about it and the cpufreq sysfs files will >> remain. > > In short, the problem I mentioned before this para is genuine. And setting > policy->cpu to the first cpu of a mask is indeed a bad idea. > >>> Also, how does suspend/resume work without CONFIG_HOTPLUG_CPU ? >>> What's the sequence of events? >>> >> >> Well, CONFIG_SUSPEND doesn't have an explicit dependency on HOTPLUG_CPU, but >> SMP systems usually use CONFIG_PM_SLEEP_SMP, which sets CONFIG_HOTPLUG_CPU. > > I read usually as *optional* > >> (I guess the reason why CONFIG_SUSPEND doesn't depend on HOTPLUG_CPU is >> because suspend is possible even on uniprocessor systems and hence the >> Kconfig dependency wasn't really justified). > > Again the same question, how do we suspend when HOTPLUG is disabled? > >From what I understand, if you disable HOTPLUG_CPU and enable CONFIG_SUSPEND and try suspend/resume on an SMP system, the disable_nonboot_cpus() call will return silently without doing anything. Thus, suspend will fail silently and the system might have trouble resuming. But surprisingly we have never had such bug reports so far! Most probably this is because PM_SLEEP_SMP has a default of y (which in turn selects HOTPLUG_CPU): config PM_SLEEP_SMP def_bool y depends on SMP depends on ARCH_SUSPEND_POSSIBLE || ARCH_HIBERNATION_POSSIBLE depends on PM_SLEEP select HOTPLUG_CPU So I guess nobody really tried turning this off on SMP systems and then trying suspend. Then I started looking at the git history and wondered where this Kconfig dependency between SUSPEND and SMP<->HOTPLUG_CPU got messed up. But instead I found that the initial commit itself didn't get the dependency right. Commit 296699de6bdc (Introduce CONFIG_SUSPEND for suspend-to-Ram and standby) introduced all the Kconfig options, and it indeed mentions this in the changelog: "Make HOTPLUG_CPU be selected automatically if SUSPEND or HIBERNATION has been chosen and the kernel is intended for SMP systems". But unfortunately, the code didn't get it right because it made CONFIG_SUSPEND depend on SUSPEND_SMP_POSSIBLE instead of SUSPEND_SMP. In other words, we have had this incorrect dependency all the time! Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 1/2] cpufreq: Don't destroy/realloc policy/sysfs on hotplug/suspend
various places in the code. Otherwise it becomes very hard to follow your thought-flow just by looking at the patch. So please split up the patch further and also make the changelogs useful to review the patch :-) The link that Viresh gave above also did a lot of code reorganization in cpufreq, so it should give you a good example of how to proceed. [...] >> __cpufreq_add_dev(dev, NULL); >> break; >> >> case CPU_DOWN_PREPARE: >> - __cpufreq_remove_dev_prepare(dev, NULL); >> - break; >> - >> - case CPU_POST_DEAD: >> - __cpufreq_remove_dev_finish(dev, NULL); >> - break; >> - >> - case CPU_DOWN_FAILED: >> - __cpufreq_add_dev(dev, NULL); >> + __cpufreq_remove_dev(dev, NULL); > > @Srivatsa: You might want to have a look at this, remove sequence was > separated for some purpose and I am just not able to concentrate enough > to think of that, just too many cases running in my mind :) > Yeah, we had split it into _remove_dev_prepare() and _remove_dev_finish() to avoid a few potential deadlocks. We wanted to call _remove_dev_prepare() in the DOWN_PREPARE stage and then call _remove_dev_finish() (which waits for the kobject refcount to drop) in the POST_DEAD stage. That is, we wanted to do the kobject cleanup after releasing the hotplug lock, and POST_DEAD stage was well-suited for that. Commit 1aee40ac9c8 (cpufreq: Invoke __cpufreq_remove_dev_finish() after releasing cpu_hotplug.lock) explains this in detail. Saravana, please take a look at that reasoning and ensure that your patch doesn't re-introduce those deadlock possibilities! >> break; >> } >> } > > I am still not sure if everything will work as expected as I seriously doubt > my reviewing capabilities. There might be corner cases which I am still > missing. > Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] cpufreq: Don't destroy/realloc policy/sysfs on hotplug/suspend
On 07/16/2014 11:14 AM, Viresh Kumar wrote: > On 15 July 2014 12:28, Srivatsa S. Bhat wrote: >> Wait, allowing an offline CPU to be the policy->cpu (i.e., the CPU which is >> considered as the master of the policy/group) is just absurd. > > Yeah, that was as Absurd as I am :) > I have had my own share of silly ideas over the years; so don't worry, we are all in the same boat ;-) >> The goal of this patchset should be to just de-couple the sysfs >> files/ownership >> from the policy->cpu to an extent where it doesn't matter who owns those >> files, and probably make it easier to do CPU hotplug without having to >> destroy and recreate the files on every hotplug operation. > > I went to that Absurd idea because we thought we can skip playing with > the sysfs nodes on suspend/hotplug. > > And if policy->cpu keeps changing with hotplug, we *may* have to keep > sysfs stuff moving as well. One way to avoid that is by using something > like: policy->sysfs_cpu, but wasn't sure if that's the right path to follow. > Hmm, I understand.. Even I don't have any suggestions as of now, since I haven't spent enough time thinking of alternatives yet. > Lets see what Saravana's new patchset has for us :) > Yep :-) Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] cpufreq: Don't destroy/realloc policy/sysfs on hotplug/suspend
On 07/15/2014 11:05 PM, skan...@codeaurora.org wrote: > > Srivatsa S. Bhat wrote: >> On 07/15/2014 11:06 AM, Saravana Kannan wrote: >>> On 07/14/2014 09:35 PM, Viresh Kumar wrote: >>>> On 15 July 2014 00:38, Saravana Kannan wrote: >>>>> Yeah, it definitely crashes if policy->cpu if an offline cpu. Because >>>>> the >>>>> mutex would be uninitialized if it's stopped after boot or it would >>>>> never >>>>> have been initialized (depending on how you fix policy->cpu at boot). >>>>> >>>>> Look at this snippet on the actual tree and it should be pretty >>>>> evident. >>>> >>>> Yeah, I missed it. So the problem is we initialize timer_mutex's for >>>> policy->cpus. So we need to do that just for policy->cpu and also we >>>> don't >>>> need a per-cpu timer_mutex anymore. >>>> >>> >>> Btw, I tried to take a stab at removing any assumption in cpufreq code >>> about policy->cpu being ONLINE. >> >> Wait, allowing an offline CPU to be the policy->cpu (i.e., the CPU which >> is >> considered as the master of the policy/group) is just absurd. If there is >> no leader, there is no army. We should NOT sacrifice sane semantics for >> the >> sake of simplifying the code. >> >>> There are 160 instances of those of with >>> 23 are in cpufreq.c >>> >> >> And that explains why. It is just *natural* to assume that the CPUs >> governed >> by a policy are online. Especially so for the CPU which is supposed to be >> the policy leader. Let us please not change that - it will become >> counter-intuitive if we do so. [ The other reason is that physical hotplug >> is also possible on some systems... in that case your code might make a >> CPU >> which is not even present (but possible) as the policy->cpu.. and great >> 'fun' >> will ensue after that ;-( ] >> >> The goal of this patchset should be to just de-couple the sysfs >> files/ownership >> from the policy->cpu to an extent where it doesn't matter who owns those >> files, and probably make it easier to do CPU hotplug without having to >> destroy and recreate the files on every hotplug operation. >> >> This is exactly why the _implementation_ matters in this particular case - >> if we can't achieve the simplification by keeping sane semantics, then we >> shouldn't do the simplification! >> >> That said, I think we should keep trying - we haven't exhausted all ideas >> yet :-) >> > > I don't think we disagree. To summarize this topic: I tried to keep the > policy->cpu an actual online CPU so as to not break existing semantics in > this patch. Viresh asked "why not fix it at boot?". My response was to > keep it an online CPU and give it a shot in a separate patch if we really > want that. It's too risky to do that in this patch and also not a > mandatory change for this patch. > > I think we can work out the details on the need to fixing policy->cpu at > boot and whether there's even a need for policy->cpu (when we already have > policy->cpus) in a separate thread after the dust settles on this one? > Sure, that sounds good! Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] cpufreq: Don't destroy/realloc policy/sysfs on hotplug/suspend
On 07/15/2014 11:05 PM, skan...@codeaurora.org wrote: Srivatsa S. Bhat wrote: On 07/15/2014 11:06 AM, Saravana Kannan wrote: On 07/14/2014 09:35 PM, Viresh Kumar wrote: On 15 July 2014 00:38, Saravana Kannan skan...@codeaurora.org wrote: Yeah, it definitely crashes if policy-cpu if an offline cpu. Because the mutex would be uninitialized if it's stopped after boot or it would never have been initialized (depending on how you fix policy-cpu at boot). Look at this snippet on the actual tree and it should be pretty evident. Yeah, I missed it. So the problem is we initialize timer_mutex's for policy-cpus. So we need to do that just for policy-cpu and also we don't need a per-cpu timer_mutex anymore. Btw, I tried to take a stab at removing any assumption in cpufreq code about policy-cpu being ONLINE. Wait, allowing an offline CPU to be the policy-cpu (i.e., the CPU which is considered as the master of the policy/group) is just absurd. If there is no leader, there is no army. We should NOT sacrifice sane semantics for the sake of simplifying the code. There are 160 instances of those of with 23 are in cpufreq.c And that explains why. It is just *natural* to assume that the CPUs governed by a policy are online. Especially so for the CPU which is supposed to be the policy leader. Let us please not change that - it will become counter-intuitive if we do so. [ The other reason is that physical hotplug is also possible on some systems... in that case your code might make a CPU which is not even present (but possible) as the policy-cpu.. and great 'fun' will ensue after that ;-( ] The goal of this patchset should be to just de-couple the sysfs files/ownership from the policy-cpu to an extent where it doesn't matter who owns those files, and probably make it easier to do CPU hotplug without having to destroy and recreate the files on every hotplug operation. This is exactly why the _implementation_ matters in this particular case - if we can't achieve the simplification by keeping sane semantics, then we shouldn't do the simplification! That said, I think we should keep trying - we haven't exhausted all ideas yet :-) I don't think we disagree. To summarize this topic: I tried to keep the policy-cpu an actual online CPU so as to not break existing semantics in this patch. Viresh asked why not fix it at boot?. My response was to keep it an online CPU and give it a shot in a separate patch if we really want that. It's too risky to do that in this patch and also not a mandatory change for this patch. I think we can work out the details on the need to fixing policy-cpu at boot and whether there's even a need for policy-cpu (when we already have policy-cpus) in a separate thread after the dust settles on this one? Sure, that sounds good! Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] cpufreq: Don't destroy/realloc policy/sysfs on hotplug/suspend
On 07/16/2014 11:14 AM, Viresh Kumar wrote: On 15 July 2014 12:28, Srivatsa S. Bhat sriva...@mit.edu wrote: Wait, allowing an offline CPU to be the policy-cpu (i.e., the CPU which is considered as the master of the policy/group) is just absurd. Yeah, that was as Absurd as I am :) I have had my own share of silly ideas over the years; so don't worry, we are all in the same boat ;-) The goal of this patchset should be to just de-couple the sysfs files/ownership from the policy-cpu to an extent where it doesn't matter who owns those files, and probably make it easier to do CPU hotplug without having to destroy and recreate the files on every hotplug operation. I went to that Absurd idea because we thought we can skip playing with the sysfs nodes on suspend/hotplug. And if policy-cpu keeps changing with hotplug, we *may* have to keep sysfs stuff moving as well. One way to avoid that is by using something like: policy-sysfs_cpu, but wasn't sure if that's the right path to follow. Hmm, I understand.. Even I don't have any suggestions as of now, since I haven't spent enough time thinking of alternatives yet. Lets see what Saravana's new patchset has for us :) Yep :-) Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 1/2] cpufreq: Don't destroy/realloc policy/sysfs on hotplug/suspend
. [...] __cpufreq_add_dev(dev, NULL); break; case CPU_DOWN_PREPARE: - __cpufreq_remove_dev_prepare(dev, NULL); - break; - - case CPU_POST_DEAD: - __cpufreq_remove_dev_finish(dev, NULL); - break; - - case CPU_DOWN_FAILED: - __cpufreq_add_dev(dev, NULL); + __cpufreq_remove_dev(dev, NULL); @Srivatsa: You might want to have a look at this, remove sequence was separated for some purpose and I am just not able to concentrate enough to think of that, just too many cases running in my mind :) Yeah, we had split it into _remove_dev_prepare() and _remove_dev_finish() to avoid a few potential deadlocks. We wanted to call _remove_dev_prepare() in the DOWN_PREPARE stage and then call _remove_dev_finish() (which waits for the kobject refcount to drop) in the POST_DEAD stage. That is, we wanted to do the kobject cleanup after releasing the hotplug lock, and POST_DEAD stage was well-suited for that. Commit 1aee40ac9c8 (cpufreq: Invoke __cpufreq_remove_dev_finish() after releasing cpu_hotplug.lock) explains this in detail. Saravana, please take a look at that reasoning and ensure that your patch doesn't re-introduce those deadlock possibilities! break; } } I am still not sure if everything will work as expected as I seriously doubt my reviewing capabilities. There might be corner cases which I am still missing. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 1/2] cpufreq: Don't destroy/realloc policy/sysfs on hotplug/suspend
On 07/16/2014 06:43 PM, Viresh Kumar wrote: On 16 July 2014 16:46, Srivatsa S. Bhat sriva...@mit.edu wrote: Short answer: If the sysfs directory has already been created by cpufreq, then yes, it will remain as it is. However, if the online operation failed before that, then cpufreq won't know about that CPU at all, and no file will be created. Long answer: The existing cpufreq code does all its work (including creating the sysfs directories etc) at the CPU_ONLINE stage. This stage is not expected to fail (in fact even the core CPU hotplug code in kernel/cpu.c doesn't care for error returns at this point). So if a CPU fails to come up in earlier stages itself (such as CPU_UP_PREPARE), then cpufreq won't even hear about that CPU, and hence no sysfs files will be created/linked. However, if the CPU bringup operation fails during the CPU_ONLINE stage after the cpufreq's notifier has been invoked, then we do nothing about it and the cpufreq sysfs files will remain. In short, the problem I mentioned before this para is genuine. And setting policy-cpu to the first cpu of a mask is indeed a bad idea. Also, how does suspend/resume work without CONFIG_HOTPLUG_CPU ? What's the sequence of events? Well, CONFIG_SUSPEND doesn't have an explicit dependency on HOTPLUG_CPU, but SMP systems usually use CONFIG_PM_SLEEP_SMP, which sets CONFIG_HOTPLUG_CPU. I read usually as *optional* (I guess the reason why CONFIG_SUSPEND doesn't depend on HOTPLUG_CPU is because suspend is possible even on uniprocessor systems and hence the Kconfig dependency wasn't really justified). Again the same question, how do we suspend when HOTPLUG is disabled? From what I understand, if you disable HOTPLUG_CPU and enable CONFIG_SUSPEND and try suspend/resume on an SMP system, the disable_nonboot_cpus() call will return silently without doing anything. Thus, suspend will fail silently and the system might have trouble resuming. But surprisingly we have never had such bug reports so far! Most probably this is because PM_SLEEP_SMP has a default of y (which in turn selects HOTPLUG_CPU): config PM_SLEEP_SMP def_bool y depends on SMP depends on ARCH_SUSPEND_POSSIBLE || ARCH_HIBERNATION_POSSIBLE depends on PM_SLEEP select HOTPLUG_CPU So I guess nobody really tried turning this off on SMP systems and then trying suspend. Then I started looking at the git history and wondered where this Kconfig dependency between SUSPEND and SMP-HOTPLUG_CPU got messed up. But instead I found that the initial commit itself didn't get the dependency right. Commit 296699de6bdc (Introduce CONFIG_SUSPEND for suspend-to-Ram and standby) introduced all the Kconfig options, and it indeed mentions this in the changelog: Make HOTPLUG_CPU be selected automatically if SUSPEND or HIBERNATION has been chosen and the kernel is intended for SMP systems. But unfortunately, the code didn't get it right because it made CONFIG_SUSPEND depend on SUSPEND_SMP_POSSIBLE instead of SUSPEND_SMP. In other words, we have had this incorrect dependency all the time! Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] cpufreq: Don't destroy/realloc policy/sysfs on hotplug/suspend
On 07/15/2014 11:06 AM, Saravana Kannan wrote: > On 07/14/2014 09:35 PM, Viresh Kumar wrote: >> On 15 July 2014 00:38, Saravana Kannan wrote: >>> Yeah, it definitely crashes if policy->cpu if an offline cpu. Because >>> the >>> mutex would be uninitialized if it's stopped after boot or it would >>> never >>> have been initialized (depending on how you fix policy->cpu at boot). >>> >>> Look at this snippet on the actual tree and it should be pretty evident. >> >> Yeah, I missed it. So the problem is we initialize timer_mutex's for >> policy->cpus. So we need to do that just for policy->cpu and also we >> don't >> need a per-cpu timer_mutex anymore. >> > > Btw, I tried to take a stab at removing any assumption in cpufreq code > about policy->cpu being ONLINE. Wait, allowing an offline CPU to be the policy->cpu (i.e., the CPU which is considered as the master of the policy/group) is just absurd. If there is no leader, there is no army. We should NOT sacrifice sane semantics for the sake of simplifying the code. > There are 160 instances of those of with > 23 are in cpufreq.c > And that explains why. It is just *natural* to assume that the CPUs governed by a policy are online. Especially so for the CPU which is supposed to be the policy leader. Let us please not change that - it will become counter-intuitive if we do so. [ The other reason is that physical hotplug is also possible on some systems... in that case your code might make a CPU which is not even present (but possible) as the policy->cpu.. and great 'fun' will ensue after that ;-( ] The goal of this patchset should be to just de-couple the sysfs files/ownership from the policy->cpu to an extent where it doesn't matter who owns those files, and probably make it easier to do CPU hotplug without having to destroy and recreate the files on every hotplug operation. This is exactly why the _implementation_ matters in this particular case - if we can't achieve the simplification by keeping sane semantics, then we shouldn't do the simplification! That said, I think we should keep trying - we haven't exhausted all ideas yet :-) Regards, Srivatsa S. Bhat > So, even if we are sure cpufreq.c is fine, it's 137 other uses spread > across all the other files. I definitely don't want to try and fix those > as part of this patch. Way too risky and hard to get the test coverage > it would need. Even some of the acpi cpufreq drivers seem to be making > this assumption. > > Btw, I think v3 is done. I did some testing and it was fine. But made > some minor changes. Will test tomorrow to make sure I didn't break > anything with the minor changes and then send them out. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] cpufreq: Don't destroy/realloc policy/sysfs on hotplug/suspend
On 07/15/2014 11:06 AM, Saravana Kannan wrote: On 07/14/2014 09:35 PM, Viresh Kumar wrote: On 15 July 2014 00:38, Saravana Kannan skan...@codeaurora.org wrote: Yeah, it definitely crashes if policy-cpu if an offline cpu. Because the mutex would be uninitialized if it's stopped after boot or it would never have been initialized (depending on how you fix policy-cpu at boot). Look at this snippet on the actual tree and it should be pretty evident. Yeah, I missed it. So the problem is we initialize timer_mutex's for policy-cpus. So we need to do that just for policy-cpu and also we don't need a per-cpu timer_mutex anymore. Btw, I tried to take a stab at removing any assumption in cpufreq code about policy-cpu being ONLINE. Wait, allowing an offline CPU to be the policy-cpu (i.e., the CPU which is considered as the master of the policy/group) is just absurd. If there is no leader, there is no army. We should NOT sacrifice sane semantics for the sake of simplifying the code. There are 160 instances of those of with 23 are in cpufreq.c And that explains why. It is just *natural* to assume that the CPUs governed by a policy are online. Especially so for the CPU which is supposed to be the policy leader. Let us please not change that - it will become counter-intuitive if we do so. [ The other reason is that physical hotplug is also possible on some systems... in that case your code might make a CPU which is not even present (but possible) as the policy-cpu.. and great 'fun' will ensue after that ;-( ] The goal of this patchset should be to just de-couple the sysfs files/ownership from the policy-cpu to an extent where it doesn't matter who owns those files, and probably make it easier to do CPU hotplug without having to destroy and recreate the files on every hotplug operation. This is exactly why the _implementation_ matters in this particular case - if we can't achieve the simplification by keeping sane semantics, then we shouldn't do the simplification! That said, I think we should keep trying - we haven't exhausted all ideas yet :-) Regards, Srivatsa S. Bhat So, even if we are sure cpufreq.c is fine, it's 137 other uses spread across all the other files. I definitely don't want to try and fix those as part of this patch. Way too risky and hard to get the test coverage it would need. Even some of the acpi cpufreq drivers seem to be making this assumption. Btw, I think v3 is done. I did some testing and it was fine. But made some minor changes. Will test tomorrow to make sure I didn't break anything with the minor changes and then send them out. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] cpufreq: Don't destroy/realloc policy/sysfs on hotplug/suspend
On 07/11/2014 09:48 AM, Saravana Kannan wrote: > The CPUfreq driver moves the cpufreq policy ownership between CPUs when > CPUs within a cluster (CPUs sharing same policy) go ONLINE/OFFLINE. When > moving policy ownership between CPUs, it also moves the cpufreq sysfs > directory between CPUs and also fixes up the symlinks of the other CPUs in > the cluster. > > Also, when all the CPUs in a cluster go OFFLINE, all the sysfs nodes and > directories are deleted, the kobject is released and the policy is freed. > And when the first CPU in a cluster comes up, the policy is reallocated and > initialized, kobject is acquired, the sysfs nodes are created or symlinked, > etc. > > All these steps end up creating unnecessarily complicated code and locking. > There's no real benefit to adding/removing/moving the sysfs nodes and the > policy between CPUs. Other per CPU sysfs directories like power and cpuidle > are left alone during hotplug. So there's some precedence to what this > patch is trying to do. > > This patch simplifies a lot of the code and locking by removing the > adding/removing/moving of policy/sysfs/kobj and just leaves the cpufreq > directory and policy in place irrespective of whether the CPUs are > ONLINE/OFFLINE. > > Leaving the policy, sysfs and kobject in place also brings these additional > benefits: > * Faster suspend/resume. > * Faster hotplug. > * Sysfs file permissions maintained across hotplug without userspace > workarounds. > * Policy settings and governor tunables maintained across suspend/resume > and hotplug. > * Cpufreq stats would be maintained across hotplug for all CPUs and can be > queried even after CPU goes OFFLINE. > > Change-Id: I39c395e1fee8731880c0fd7c8a9c1d83e2e4b8d0 > Tested-by: Stephen Boyd > Signed-off-by: Saravana Kannan > --- > > Preliminary testing has been done. cpufreq directories are getting created > properly. Online/offline of CPUs work. Policies remain unmodifiable from > userspace when all policy CPUs are offline. > > Error handling code has NOT been updated. > > I've added a bunch of FIXME comments next to where I'm not sure about the > locking in the existing code. I believe most of the try_lock's were present > to prevent a deadlock between sysfs lock and the cpufreq locks. Now that > the sysfs entries are not touched after creating them, we should be able to > replace most/all of these try_lock's with a normal lock. > > This patch has more room for code simplification, but I would like to get > some acks for the functionality and this code before I do further > simplification. > The idea behind this work is very welcome indeed! IMHO, there is nothing conceptually wrong in maintaining the per-cpu sysfs files across CPU hotplug (as long as we take care to return appropriate error codes if userspace tries to set values using the control files of offline CPUs). So, it really boils down to whether or not we get the implementation right; the idea itself looks fine as of now. Hence, your efforts in making this patch(set) easier to review will certainly help. Perhaps you can simplify the code later, but at this point, splitting up this patch into multiple smaller, reviewable pieces (accompanied by well-written changelogs that explain the intent) is the utmost priority. Just like Viresh, even I had a hard time reviewing all of this in one go. Thank you for taking up this work! Regards, Srivatsa S. Bhat > I should also be able to remove get_online_cpus() in the store function and > replace it with just a check for policy->governor_enabled. That should > theoretically reduce some contention between cpufreq stats check and > hotplug of unrelated CPUs. > > Appreciate all the feedback. > > Thanks, > Saravana > > drivers/cpufreq/cpufreq.c | 331 > ++ > 1 file changed, 69 insertions(+), 262 deletions(-) > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index 62259d2..e350b15 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -859,16 +859,16 @@ void cpufreq_sysfs_remove_file(const struct attribute > *attr) > } > EXPORT_SYMBOL(cpufreq_sysfs_remove_file); > > -/* symlink affected CPUs */ > +/* symlink related CPUs */ > static int cpufreq_add_dev_symlink(struct cpufreq_policy *policy) > { > - unsigned int j; > + unsigned int j, first_cpu = cpumask_first(policy->related_cpus); > int ret = 0; > > - for_each_cpu(j, policy->cpus) { > + for_each_cpu(j, policy->related_cpus) { > struct device *cpu_dev; > > - if (j == policy->cpu) > + if (j == first_cpu) > continue; > >
Re: [PATCH v2] cpufreq: Don't destroy/realloc policy/sysfs on hotplug/suspend
On 07/11/2014 09:48 AM, Saravana Kannan wrote: The CPUfreq driver moves the cpufreq policy ownership between CPUs when CPUs within a cluster (CPUs sharing same policy) go ONLINE/OFFLINE. When moving policy ownership between CPUs, it also moves the cpufreq sysfs directory between CPUs and also fixes up the symlinks of the other CPUs in the cluster. Also, when all the CPUs in a cluster go OFFLINE, all the sysfs nodes and directories are deleted, the kobject is released and the policy is freed. And when the first CPU in a cluster comes up, the policy is reallocated and initialized, kobject is acquired, the sysfs nodes are created or symlinked, etc. All these steps end up creating unnecessarily complicated code and locking. There's no real benefit to adding/removing/moving the sysfs nodes and the policy between CPUs. Other per CPU sysfs directories like power and cpuidle are left alone during hotplug. So there's some precedence to what this patch is trying to do. This patch simplifies a lot of the code and locking by removing the adding/removing/moving of policy/sysfs/kobj and just leaves the cpufreq directory and policy in place irrespective of whether the CPUs are ONLINE/OFFLINE. Leaving the policy, sysfs and kobject in place also brings these additional benefits: * Faster suspend/resume. * Faster hotplug. * Sysfs file permissions maintained across hotplug without userspace workarounds. * Policy settings and governor tunables maintained across suspend/resume and hotplug. * Cpufreq stats would be maintained across hotplug for all CPUs and can be queried even after CPU goes OFFLINE. Change-Id: I39c395e1fee8731880c0fd7c8a9c1d83e2e4b8d0 Tested-by: Stephen Boyd sb...@codeaurora.org Signed-off-by: Saravana Kannan skan...@codeaurora.org --- Preliminary testing has been done. cpufreq directories are getting created properly. Online/offline of CPUs work. Policies remain unmodifiable from userspace when all policy CPUs are offline. Error handling code has NOT been updated. I've added a bunch of FIXME comments next to where I'm not sure about the locking in the existing code. I believe most of the try_lock's were present to prevent a deadlock between sysfs lock and the cpufreq locks. Now that the sysfs entries are not touched after creating them, we should be able to replace most/all of these try_lock's with a normal lock. This patch has more room for code simplification, but I would like to get some acks for the functionality and this code before I do further simplification. The idea behind this work is very welcome indeed! IMHO, there is nothing conceptually wrong in maintaining the per-cpu sysfs files across CPU hotplug (as long as we take care to return appropriate error codes if userspace tries to set values using the control files of offline CPUs). So, it really boils down to whether or not we get the implementation right; the idea itself looks fine as of now. Hence, your efforts in making this patch(set) easier to review will certainly help. Perhaps you can simplify the code later, but at this point, splitting up this patch into multiple smaller, reviewable pieces (accompanied by well-written changelogs that explain the intent) is the utmost priority. Just like Viresh, even I had a hard time reviewing all of this in one go. Thank you for taking up this work! Regards, Srivatsa S. Bhat I should also be able to remove get_online_cpus() in the store function and replace it with just a check for policy-governor_enabled. That should theoretically reduce some contention between cpufreq stats check and hotplug of unrelated CPUs. Appreciate all the feedback. Thanks, Saravana drivers/cpufreq/cpufreq.c | 331 ++ 1 file changed, 69 insertions(+), 262 deletions(-) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 62259d2..e350b15 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -859,16 +859,16 @@ void cpufreq_sysfs_remove_file(const struct attribute *attr) } EXPORT_SYMBOL(cpufreq_sysfs_remove_file); -/* symlink affected CPUs */ +/* symlink related CPUs */ static int cpufreq_add_dev_symlink(struct cpufreq_policy *policy) { - unsigned int j; + unsigned int j, first_cpu = cpumask_first(policy-related_cpus); int ret = 0; - for_each_cpu(j, policy-cpus) { + for_each_cpu(j, policy-related_cpus) { struct device *cpu_dev; - if (j == policy-cpu) + if (j == first_cpu) continue; pr_debug(Adding link for CPU: %u\n, j); @@ -881,12 +881,16 @@ static int cpufreq_add_dev_symlink(struct cpufreq_policy *policy) return ret; } -static int cpufreq_add_dev_interface(struct cpufreq_policy *policy, - struct device *dev) +static int cpufreq_add_dev_interface(struct cpufreq_policy *policy
Re: [PATCH] cpufreq: report driver's successful {un}registration
On 06/26/2014 02:21 PM, Viresh Kumar wrote: > We do report driver's successful {un}registration from cpufreq core, but is > done > with pr_debug() and so this doesn't appear in boot logs. > > Convert this to pr_info() to make it visible in logs. > While at it, let's also standardize those messages, since they will be more visible now. > Signed-off-by: Viresh Kumar > --- > drivers/cpufreq/cpufreq.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index 62259d2..63d8f8f 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -2468,7 +2468,7 @@ int cpufreq_register_driver(struct cpufreq_driver > *driver_data) > } > > register_hotcpu_notifier(_cpu_notifier); > - pr_debug("driver %s up and running\n", driver_data->name); > + pr_info("driver %s up and running\n", driver_data->name); How about "Registered cpufreq driver: %s\n" > > return 0; > err_if_unreg: > @@ -2499,7 +2499,7 @@ int cpufreq_unregister_driver(struct cpufreq_driver > *driver) > if (!cpufreq_driver || (driver != cpufreq_driver)) > return -EINVAL; > > - pr_debug("unregistering driver %s\n", driver->name); > + pr_info("unregistering driver %s\n", driver->name); And "Unregistered cpufreq driver: %s\n" (Also, its probably a good idea to have 2 prints here, just like we have for cpufreq_register_driver() - one with pr_debug() at the beginning of the function which tells us that we are *about* to register the driver, and then a second print with pr_info() at the end of the function that tells us that we successfully registered the driver. We can do the same thing for unregistration as well.) > > subsys_interface_unregister(_interface); > if (cpufreq_boost_supported()) > Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cpufreq: report driver's successful {un}registration
On 06/26/2014 02:21 PM, Viresh Kumar wrote: We do report driver's successful {un}registration from cpufreq core, but is done with pr_debug() and so this doesn't appear in boot logs. Convert this to pr_info() to make it visible in logs. While at it, let's also standardize those messages, since they will be more visible now. Signed-off-by: Viresh Kumar viresh.ku...@linaro.org --- drivers/cpufreq/cpufreq.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 62259d2..63d8f8f 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -2468,7 +2468,7 @@ int cpufreq_register_driver(struct cpufreq_driver *driver_data) } register_hotcpu_notifier(cpufreq_cpu_notifier); - pr_debug(driver %s up and running\n, driver_data-name); + pr_info(driver %s up and running\n, driver_data-name); How about Registered cpufreq driver: %s\n return 0; err_if_unreg: @@ -2499,7 +2499,7 @@ int cpufreq_unregister_driver(struct cpufreq_driver *driver) if (!cpufreq_driver || (driver != cpufreq_driver)) return -EINVAL; - pr_debug(unregistering driver %s\n, driver-name); + pr_info(unregistering driver %s\n, driver-name); And Unregistered cpufreq driver: %s\n (Also, its probably a good idea to have 2 prints here, just like we have for cpufreq_register_driver() - one with pr_debug() at the beginning of the function which tells us that we are *about* to register the driver, and then a second print with pr_info() at the end of the function that tells us that we successfully registered the driver. We can do the same thing for unregistration as well.) subsys_interface_unregister(cpufreq_interface); if (cpufreq_boost_supported()) Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/2] CPU hotplug, smp: Flush any pending IPI callbacks before CPU offline
On 06/25/2014 09:12 PM, Sasha Levin wrote: > On 05/26/2014 07:08 AM, Srivatsa S. Bhat wrote: >> During CPU offline, in stop-machine, we don't enforce any rule in the >> _DISABLE_IRQ stage, regarding the order in which the outgoing CPU and the >> other >> CPUs disable their local interrupts. Hence, we can encounter a scenario as >> depicted below, in which IPIs are sent by the other CPUs to the CPU going >> offline (while it is *still* online), but the outgoing CPU notices them only >> *after* it has gone offline. >> [...] > Hi all, > > While fuzzing with trinity inside a KVM tools guest running the latest -next > kernel I've stumbled on the following spew: > Thanks for the bug report. Please test if this patch fixes the problem for you: https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git/commit/?h=timers/nohz=921d8b81281ecdca686369f52165d04fa3505bd7 Regards, Srivatsa S. Bhat > [ 1982.600053] kernel BUG at kernel/irq_work.c:175! > [ 1982.600053] invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC > [ 1982.600053] Dumping ftrace buffer: > [ 1982.600053](ftrace buffer empty) > [ 1982.600053] Modules linked in: > [ 1982.600053] CPU: 14 PID: 168 Comm: migration/14 Not tainted > 3.16.0-rc2-next-20140624-sasha-00024-g332b58d #726 > [ 1982.600053] task: 88036a5a3000 ti: 88036a5ac000 task.ti: > 88036a5ac000 > [ 1982.600053] RIP: irq_work_run (kernel/irq_work.c:175 (discriminator 1)) > [ 1982.600053] RSP: :88036a5afbe0 EFLAGS: 00010046 > [ 1982.600053] RAX: 8001 RBX: RCX: > 0008 > [ 1982.600053] RDX: 000e RSI: af9185fb RDI: > > [ 1982.600053] RBP: 88036a5afc08 R08: 00099224 R09: > > [ 1982.600053] R10: R11: 0001 R12: > 88036afd8400 > [ 1982.600053] R13: R14: b0cf8120 R15: > b0cce5d0 > [ 1982.600053] FS: () GS:88036ae0() > knlGS: > [ 1982.600053] CS: 0010 DS: ES: CR0: 8005003b > [ 1982.600053] CR2: 019485d0 CR3: 0002c7c8f000 CR4: > 06a0 > [ 1982.600053] Stack: > [ 1982.600053] ab20fbb5 0082 88036afd8440 > > [ 1982.600053] 0001 88036a5afc28 ab20fca7 > > [ 1982.600053] ffef 88036a5afc78 ab19c58e > 000e > [ 1982.600053] Call Trace: > [ 1982.600053] ? flush_smp_call_function_queue (kernel/smp.c:263) > [ 1982.600053] hotplug_cfd (kernel/smp.c:81) > [ 1982.600053] notifier_call_chain (kernel/notifier.c:95) > [ 1982.600053] __raw_notifier_call_chain (kernel/notifier.c:395) > [ 1982.600053] __cpu_notify (kernel/cpu.c:202) > [ 1982.600053] cpu_notify (kernel/cpu.c:211) > [ 1982.600053] take_cpu_down (./arch/x86/include/asm/current.h:14 > kernel/cpu.c:312) > [ 1982.600053] multi_cpu_stop (kernel/stop_machine.c:201) > [ 1982.600053] ? __stop_cpus (kernel/stop_machine.c:170) > [ 1982.600053] cpu_stopper_thread (kernel/stop_machine.c:474) > [ 1982.600053] ? put_lock_stats.isra.12 (./arch/x86/include/asm/preempt.h:98 > kernel/locking/lockdep.c:254) > [ 1982.600053] ? _raw_spin_unlock_irqrestore > (./arch/x86/include/asm/paravirt.h:809 include/linux/spinlock_api_smp.h:160 > kernel/locking/spinlock.c:191) > [ 1982.600053] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63) > [ 1982.600053] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2557 > kernel/locking/lockdep.c:2599) > [ 1982.600053] smpboot_thread_fn (kernel/smpboot.c:160) > [ 1982.600053] ? __smpboot_create_thread (kernel/smpboot.c:105) > [ 1982.600053] kthread (kernel/kthread.c:210) > [ 1982.600053] ? wait_for_completion (kernel/sched/completion.c:77 > kernel/sched/completion.c:93 kernel/sched/completion.c:101 > kernel/sched/completion.c:122) > [ 1982.600053] ? kthread_create_on_node (kernel/kthread.c:176) > [ 1982.600053] ret_from_fork (arch/x86/kernel/entry_64.S:349) > [ 1982.600053] ? kthread_create_on_node (kernel/kthread.c:176) > [ 1982.600053] Code: 00 00 00 00 e8 63 ff ff ff 48 83 c4 08 b8 01 00 00 00 5b > 5d c3 b8 01 00 00 00 c3 90 65 8b 04 25 a0 da 00 00 a9 00 00 0f 00 75 09 <0f> > 0b 0f 1f 80 00 00 00 00 55 48 89 e5 e8 2f ff ff ff 5d c3 66 > All code > >0: 00 00 add%al,(%rax) >2: 00 00 add%al,(%rax) >4: e8 63 ff ff ff callq 0xff6c >9: 48 83 c4 08 add$0x8,%rsp >d: b8 01 00 00 00 mov$0x1,%eax > 12: 5b pop%rbx > 13: 5d pop%rbp > 14: c3
Re: [PATCH 2/6] irq_work: Implement remote queueing
On 06/25/2014 10:08 PM, Peter Zijlstra wrote: > On Wed, Jun 25, 2014 at 10:23:21AM -0600, Stephen Warren wrote: >> On 06/25/2014 04:19 AM, Peter Zijlstra wrote: >>> On Wed, Jun 25, 2014 at 03:24:11PM +0530, Srivatsa S. Bhat wrote: >>>> Wait, that was a stupid idea. hotplug_cfd() already invokes irq_work_run >>>> indirectly via flush_smp_call_function_queue(). So irq_work_cpu_notify() >>>> doesn't need to invoke it again, AFAIU. So perhaps we can get rid of >>>> irq_work_cpu_notify() altogether? >>> >>> Just so... >>> >>> getting up at 6am and sitting in an airport terminal doesn't seem to >>> agree with me; any more silly fail here? >>> >>> --- >>> Subject: irq_work: Remove BUG_ON in irq_work_run() >>> From: Peter Zijlstra >>> Date: Wed Jun 25 07:13:07 CEST 2014 >>> >>> Because of a collision with 8d056c48e486 ("CPU hotplug, smp: flush any >>> pending IPI callbacks before CPU offline"), which ends up calling >>> hotplug_cfd()->flush_smp_call_function_queue()->irq_work_run(), which >>> is not from IRQ context. >>> >>> And since that already calls irq_work_run() from the hotplug path, >>> remove our entire hotplug handling. >> >> Tested-by: Stephen Warren >> >> [with the s/static// already mentioned in this thread, obviously:-)] > > Right; I pushed out a fixed version right before loosing my tubes at the > airport :-) > > https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git/commit/?h=timers/nohz=921d8b81281ecdca686369f52165d04fa3505bd7 > This version looks good. Reviewed-by: Srivatsa S. Bhat Regards, Srivatsa S. Bhat > I've not gotten wu build bot spam on it so it must be good ;-) > > In any case, I'll add your tested-by and update later this evening. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/6] irq_work: Implement remote queueing
On 06/25/2014 03:49 PM, Peter Zijlstra wrote: > On Wed, Jun 25, 2014 at 03:24:11PM +0530, Srivatsa S. Bhat wrote: >> Wait, that was a stupid idea. hotplug_cfd() already invokes irq_work_run >> indirectly via flush_smp_call_function_queue(). So irq_work_cpu_notify() >> doesn't need to invoke it again, AFAIU. So perhaps we can get rid of >> irq_work_cpu_notify() altogether? > > Just so... > > getting up at 6am and sitting in an airport terminal doesn't seem to > agree with me; Haha :-) > any more silly fail here? > A few minor nits below.. > --- > Subject: irq_work: Remove BUG_ON in irq_work_run() > From: Peter Zijlstra > Date: Wed Jun 25 07:13:07 CEST 2014 > > Because of a collision with 8d056c48e486 ("CPU hotplug, smp: flush any > pending IPI callbacks before CPU offline"), which ends up calling > hotplug_cfd()->flush_smp_call_function_queue()->irq_work_run(), which > is not from IRQ context. > > And since that already calls irq_work_run() from the hotplug path, > remove our entire hotplug handling. > > Cc: Frederic Weisbecker > Reported-by: Stephen Warren > Signed-off-by: Peter Zijlstra > Link: http://lkml.kernel.org/n/tip-busatzs2gvz4v62258agi...@git.kernel.org > --- > kernel/irq_work.c | 48 +--- > 1 file changed, 5 insertions(+), 43 deletions(-) > > Index: linux-2.6/kernel/irq_work.c > === > --- linux-2.6.orig/kernel/irq_work.c > +++ linux-2.6/kernel/irq_work.c > @@ -160,20 +160,14 @@ static void irq_work_run_list(struct lli > } > } > > -static void __irq_work_run(void) > -{ > - irq_work_run_list(&__get_cpu_var(raised_list)); > - irq_work_run_list(&__get_cpu_var(lazy_list)); > -} > - > /* > - * Run the irq_work entries on this cpu. Requires to be ran from hardirq > - * context with local IRQs disabled. > + * hotplug calls this through: > + * hotplug_cfs() -> flush_smp_call_function_queue() s/hotplug_cfs/hotplug_cfd > */ > -void irq_work_run(void) > +static void irq_work_run(void) s/static// > { > - BUG_ON(!in_irq()); > - __irq_work_run(); > + irq_work_run_list(&__get_cpu_var(raised_list)); > + irq_work_run_list(&__get_cpu_var(lazy_list)); > } > EXPORT_SYMBOL_GPL(irq_work_run); With those 2 changes, everything looks good to me. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/6] irq_work: Implement remote queueing
On 06/25/2014 03:20 PM, Srivatsa S. Bhat wrote: > On 06/25/2014 03:09 PM, Peter Zijlstra wrote: >> On Wed, Jun 25, 2014 at 11:36:57AM +0200, Peter Zijlstra wrote: >>> On Wed, Jun 25, 2014 at 12:07:05PM +0530, Srivatsa S. Bhat wrote: >>>> I don't think irqs_disabled() is the problematic condition, since >>>> hotplug_cfg() invokes irq_work_run() from CPU_DYING context (which has >>>> irqs disabled). I guess you meant to remove the in_irq() check inside >>>> irq_work_run() instead? >>> >>> Yes, clearly I should not get up at 6am.. :-) Let me go do a new one. >> >> --- >> Subject: irq_work: Remove BUG_ON in irq_work_run() >> From: Peter Zijlstra >> Date: Wed Jun 25 07:13:07 CEST 2014 >> >> Because of a collision with 8d056c48e486 ("CPU hotplug, smp: flush any >> pending IPI callbacks before CPU offline"), which ends up calling >> hotplug_cfd()->flush_smp_call_function_queue()->irq_work_run(), which >> is not from IRQ context. >> >> Cc: Frederic Weisbecker >> Reported-by: Stephen Warren >> Signed-off-by: Peter Zijlstra >> Link: http://lkml.kernel.org/n/tip-busatzs2gvz4v62258agi...@git.kernel.org >> --- >> kernel/irq_work.c | 12 +--- >> 1 file changed, 1 insertion(+), 11 deletions(-) >> >> Index: linux-2.6/kernel/irq_work.c >> === >> --- linux-2.6.orig/kernel/irq_work.c >> +++ linux-2.6/kernel/irq_work.c >> @@ -160,21 +160,11 @@ static void irq_work_run_list(struct lli >> } >> } >> >> -static void __irq_work_run(void) > > Hmm, irq_work_cpu_notify() calls __irq_work_run() in the CPU_DYING > phase, to by-pass BUG_ON(!in_irq()). How about doing the same thing > from hotplug_cfd() as well? > Wait, that was a stupid idea. hotplug_cfd() already invokes irq_work_run indirectly via flush_smp_call_function_queue(). So irq_work_cpu_notify() doesn't need to invoke it again, AFAIU. So perhaps we can get rid of irq_work_cpu_notify() altogether? Regards, Srivatsa S. Bhat >> +static void irq_work_run(void) >> { >> irq_work_run_list(&__get_cpu_var(raised_list)); >> irq_work_run_list(&__get_cpu_var(lazy_list)); >> } >> - >> -/* >> - * Run the irq_work entries on this cpu. Requires to be ran from hardirq >> - * context with local IRQs disabled. >> - */ >> -void irq_work_run(void) >> -{ >> -BUG_ON(!in_irq()); >> -__irq_work_run(); >> -} >> EXPORT_SYMBOL_GPL(irq_work_run); >> >> /* >> > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/6] irq_work: Implement remote queueing
On 06/25/2014 03:09 PM, Peter Zijlstra wrote: > On Wed, Jun 25, 2014 at 11:36:57AM +0200, Peter Zijlstra wrote: >> On Wed, Jun 25, 2014 at 12:07:05PM +0530, Srivatsa S. Bhat wrote: >>> I don't think irqs_disabled() is the problematic condition, since >>> hotplug_cfg() invokes irq_work_run() from CPU_DYING context (which has >>> irqs disabled). I guess you meant to remove the in_irq() check inside >>> irq_work_run() instead? >> >> Yes, clearly I should not get up at 6am.. :-) Let me go do a new one. > > --- > Subject: irq_work: Remove BUG_ON in irq_work_run() > From: Peter Zijlstra > Date: Wed Jun 25 07:13:07 CEST 2014 > > Because of a collision with 8d056c48e486 ("CPU hotplug, smp: flush any > pending IPI callbacks before CPU offline"), which ends up calling > hotplug_cfd()->flush_smp_call_function_queue()->irq_work_run(), which > is not from IRQ context. > > Cc: Frederic Weisbecker > Reported-by: Stephen Warren > Signed-off-by: Peter Zijlstra > Link: http://lkml.kernel.org/n/tip-busatzs2gvz4v62258agi...@git.kernel.org > --- > kernel/irq_work.c | 12 +--- > 1 file changed, 1 insertion(+), 11 deletions(-) > > Index: linux-2.6/kernel/irq_work.c > === > --- linux-2.6.orig/kernel/irq_work.c > +++ linux-2.6/kernel/irq_work.c > @@ -160,21 +160,11 @@ static void irq_work_run_list(struct lli > } > } > > -static void __irq_work_run(void) Hmm, irq_work_cpu_notify() calls __irq_work_run() in the CPU_DYING phase, to by-pass BUG_ON(!in_irq()). How about doing the same thing from hotplug_cfd() as well? > +static void irq_work_run(void) > { > irq_work_run_list(&__get_cpu_var(raised_list)); > irq_work_run_list(&__get_cpu_var(lazy_list)); > } > - > -/* > - * Run the irq_work entries on this cpu. Requires to be ran from hardirq > - * context with local IRQs disabled. > - */ > -void irq_work_run(void) > -{ > - BUG_ON(!in_irq()); > - __irq_work_run(); > -} > EXPORT_SYMBOL_GPL(irq_work_run); > > /* > Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [migration] kernel BUG at kernel/irq_work.c:175!
On 06/25/2014 03:01 PM, Fengguang Wu wrote: > Greetings, > > 0day kernel testing robot got the below dmesg and the first bad commit is > I think this is the same issue as the one reported by Stephen Warren here: https://lkml.org/lkml/2014/6/24/765 Peter Zijlstra is working on a fix for that. Regards, Srivatsa S. Bhat > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master > commit 68c90b2c635f18ad51ae7440162f6c082ea1288d > Merge: f08af6f ec11f8c > Author: Stephen Rothwell > AuthorDate: Mon Jun 23 14:12:48 2014 +1000 > > Merge branch 'akpm-current/current' > > +-++++---+ > | | f08af6fa87 | ec11f8c81f | 68c90b2c63 | > next-20140623 | > +-++++---+ > | boot_successes | 60 | 60 | 0 | 0 >| > | boot_failures | 0 | 0 | 20 | 13 >| > | kernel_BUG_at_kernel/irq_work.c | 0 | 0 | 20 | 13 >| > | invalid_opcode | 0 | 0 | 20 | 13 >| > | RIP:irq_work_run| 0 | 0 | 20 | 13 >| > | backtrace:smpboot_thread_fn | 0 | 0 | 20 | 13 >| > +-++++---+ > > [2.194744] EDD information not available. > [2.195290] Unregister pv shared memory for cpu 0 > [2.206025] [ cut here ] > [2.206025] kernel BUG at kernel/irq_work.c:175! > [2.206025] invalid opcode: [#1] SMP DEBUG_PAGEALLOC > [2.206025] CPU: 0 PID: 9 Comm: migration/0 Not tainted > 3.16.0-rc2-02039-g68c90b2 #1 > [2.206025] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > [2.206025] task: 88001219a7e0 ti: 8800121a4000 task.ti: > 8800121a4000 > [2.206025] RIP: 0010:[] [] > irq_work_run+0xf/0x1c > [2.206025] RSP: :8800121a7c48 EFLAGS: 00010046 > [2.206025] RAX: 8001 RBX: RCX: > 0005 > [2.206025] RDX: RSI: 0008 RDI: > > [2.206025] RBP: 8800121a7c68 R08: 0002 R09: > 0001 > [2.206025] R10: 810e2a10 R11: 810b9de3 R12: > 880012412340 > [2.206025] R13: R14: R15: > 81c83e50 > [2.206025] FS: () GS:88001240() > knlGS: > [2.206025] CS: 0010 DS: ES: CR0: 8005003b > [2.206025] CR2: CR3: 01c0c000 CR4: > 06b0 > [2.206025] Stack: > [2.206025] 810e87e0 880012412380 fff0 > 81c81ba0 > [2.206025] 8800121a7c88 810e88f0 0001 > fff0 > [2.206025] 8800121a7cd0 810b6e23 > 0008 > [2.206025] Call Trace: > [2.206025] [] ? > flush_smp_call_function_queue+0xa4/0x107 > [2.206025] [] hotplug_cfd+0xad/0xbb > [2.206025] [] notifier_call_chain+0x68/0x8e > [2.206025] [] __raw_notifier_call_chain+0x9/0xb > [2.206025] [] __cpu_notify+0x1b/0x32 > [2.206025] [] cpu_notify+0xe/0x10 > [2.206025] [] take_cpu_down+0x22/0x35 > [2.206025] [] multi_cpu_stop+0x8c/0xe2 > [2.206025] [] ? cpu_stopper_thread+0x126/0x126 > [2.206025] [] cpu_stopper_thread+0x8d/0x126 > [2.206025] [] ? lock_acquire+0x94/0x9d > [2.206025] [] ? _raw_spin_unlock_irqrestore+0x40/0x55 > [2.206025] [] ? trace_hardirqs_on_caller+0x171/0x18d > [2.206025] [] ? _raw_spin_unlock_irqrestore+0x48/0x55 > [2.206025] [] smpboot_thread_fn+0x182/0x1a0 > [2.206025] [] ? in_egroup_p+0x2e/0x2e > [2.206025] [] kthread+0xcd/0xd5 > [2.206025] [] ? __kthread_parkme+0x5c/0x5c > [2.206025] [] ret_from_fork+0x7c/0xb0 > [2.206025] [] ? __kthread_parkme+0x5c/0x5c > [2.206025] Code: 48 c7 c7 65 cd b0 81 e8 43 20 fa ff c6 05 50 e1 c9 00 01 > eb 02 31 db 88 d8 5b 5d c3 65 8b 04 25 10 b8 00 00 a9 00 00 0f 00 75 02 <0f> > 0b 55 48 89 e5 e8 b5 fd ff ff 5d c3 55 48 89 e5 53 48 89 fb > [2.206025] RIP [] irq_work_run+0xf/0x1c > [2.206025] RSP > [2.206025] ---[ end trace f7f1564c3a1f35d0 ]--- > [2.206025] note: migration/0[9] exited with preempt_count 1 > > git bisect start 58ae500a03a6bf68eee323c342431bfdd3f460b6 > f08af6fa87ea33
Re: [PATCH 2/6] irq_work: Implement remote queueing
On 06/25/2014 10:47 AM, Peter Zijlstra wrote: > On Wed, Jun 25, 2014 at 07:12:34AM +0200, Peter Zijlstra wrote: >> On Tue, Jun 24, 2014 at 02:33:41PM -0600, Stephen Warren wrote: >>> On 06/10/2014 09:15 AM, Frederic Weisbecker wrote: >>>> irq work currently only supports local callbacks. However its code >>>> is mostly ready to run remote callbacks and we have some potential user. [...] >> Right you are.. I think I'll just remove the BUG_ON(), Frederic? > > Something a little so like: > > --- > Subject: irq_work: Remove BUG_ON in irq_work_run_list() I think this should be irq_work_run(), see below... > From: Peter Zijlstra > Date: Wed Jun 25 07:13:07 CEST 2014 > > Because of a collision with 8d056c48e486 ("CPU hotplug, smp: flush any > pending IPI callbacks before CPU offline"), which ends up calling > hotplug_cfd()->flush_smp_call_function_queue()->run_irq_work(), which s/run_irq_work/irq_work_run > is not from IRQ context. > > Cc: Frederic Weisbecker > Reported-by: Stephen Warren > Signed-off-by: Peter Zijlstra > --- > kernel/irq_work.c |2 -- > 1 file changed, 2 deletions(-) > > --- a/kernel/irq_work.c > +++ b/kernel/irq_work.c > @@ -130,8 +130,6 @@ static void irq_work_run_list(struct lli > struct irq_work *work; > struct llist_node *llnode; > > - BUG_ON(!irqs_disabled()); > - I don't think irqs_disabled() is the problematic condition, since hotplug_cfg() invokes irq_work_run() from CPU_DYING context (which has irqs disabled). I guess you meant to remove the in_irq() check inside irq_work_run() instead? Regards, Srivatsa S. Bhat > if (llist_empty(list)) > return; > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/6] irq_work: Implement remote queueing
On 06/25/2014 10:47 AM, Peter Zijlstra wrote: On Wed, Jun 25, 2014 at 07:12:34AM +0200, Peter Zijlstra wrote: On Tue, Jun 24, 2014 at 02:33:41PM -0600, Stephen Warren wrote: On 06/10/2014 09:15 AM, Frederic Weisbecker wrote: irq work currently only supports local callbacks. However its code is mostly ready to run remote callbacks and we have some potential user. [...] Right you are.. I think I'll just remove the BUG_ON(), Frederic? Something a little so like: --- Subject: irq_work: Remove BUG_ON in irq_work_run_list() I think this should be irq_work_run(), see below... From: Peter Zijlstra pet...@infradead.org Date: Wed Jun 25 07:13:07 CEST 2014 Because of a collision with 8d056c48e486 (CPU hotplug, smp: flush any pending IPI callbacks before CPU offline), which ends up calling hotplug_cfd()-flush_smp_call_function_queue()-run_irq_work(), which s/run_irq_work/irq_work_run is not from IRQ context. Cc: Frederic Weisbecker fweis...@gmail.com Reported-by: Stephen Warren swar...@wwwdotorg.org Signed-off-by: Peter Zijlstra pet...@infradead.org --- kernel/irq_work.c |2 -- 1 file changed, 2 deletions(-) --- a/kernel/irq_work.c +++ b/kernel/irq_work.c @@ -130,8 +130,6 @@ static void irq_work_run_list(struct lli struct irq_work *work; struct llist_node *llnode; - BUG_ON(!irqs_disabled()); - I don't think irqs_disabled() is the problematic condition, since hotplug_cfg() invokes irq_work_run() from CPU_DYING context (which has irqs disabled). I guess you meant to remove the in_irq() check inside irq_work_run() instead? Regards, Srivatsa S. Bhat if (llist_empty(list)) return; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [migration] kernel BUG at kernel/irq_work.c:175!
On 06/25/2014 03:01 PM, Fengguang Wu wrote: Greetings, 0day kernel testing robot got the below dmesg and the first bad commit is I think this is the same issue as the one reported by Stephen Warren here: https://lkml.org/lkml/2014/6/24/765 Peter Zijlstra is working on a fix for that. Regards, Srivatsa S. Bhat git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master commit 68c90b2c635f18ad51ae7440162f6c082ea1288d Merge: f08af6f ec11f8c Author: Stephen Rothwell s...@canb.auug.org.au AuthorDate: Mon Jun 23 14:12:48 2014 +1000 Merge branch 'akpm-current/current' +-++++---+ | | f08af6fa87 | ec11f8c81f | 68c90b2c63 | next-20140623 | +-++++---+ | boot_successes | 60 | 60 | 0 | 0 | | boot_failures | 0 | 0 | 20 | 13 | | kernel_BUG_at_kernel/irq_work.c | 0 | 0 | 20 | 13 | | invalid_opcode | 0 | 0 | 20 | 13 | | RIP:irq_work_run| 0 | 0 | 20 | 13 | | backtrace:smpboot_thread_fn | 0 | 0 | 20 | 13 | +-++++---+ [2.194744] EDD information not available. [2.195290] Unregister pv shared memory for cpu 0 [2.206025] [ cut here ] [2.206025] kernel BUG at kernel/irq_work.c:175! [2.206025] invalid opcode: [#1] SMP DEBUG_PAGEALLOC [2.206025] CPU: 0 PID: 9 Comm: migration/0 Not tainted 3.16.0-rc2-02039-g68c90b2 #1 [2.206025] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [2.206025] task: 88001219a7e0 ti: 8800121a4000 task.ti: 8800121a4000 [2.206025] RIP: 0010:[810f9318] [810f9318] irq_work_run+0xf/0x1c [2.206025] RSP: :8800121a7c48 EFLAGS: 00010046 [2.206025] RAX: 8001 RBX: RCX: 0005 [2.206025] RDX: RSI: 0008 RDI: [2.206025] RBP: 8800121a7c68 R08: 0002 R09: 0001 [2.206025] R10: 810e2a10 R11: 810b9de3 R12: 880012412340 [2.206025] R13: R14: R15: 81c83e50 [2.206025] FS: () GS:88001240() knlGS: [2.206025] CS: 0010 DS: ES: CR0: 8005003b [2.206025] CR2: CR3: 01c0c000 CR4: 06b0 [2.206025] Stack: [2.206025] 810e87e0 880012412380 fff0 81c81ba0 [2.206025] 8800121a7c88 810e88f0 0001 fff0 [2.206025] 8800121a7cd0 810b6e23 0008 [2.206025] Call Trace: [2.206025] [810e87e0] ? flush_smp_call_function_queue+0xa4/0x107 [2.206025] [810e88f0] hotplug_cfd+0xad/0xbb [2.206025] [810b6e23] notifier_call_chain+0x68/0x8e [2.206025] [810b70c0] __raw_notifier_call_chain+0x9/0xb [2.206025] [8109b39e] __cpu_notify+0x1b/0x32 [2.206025] [8109b3c3] cpu_notify+0xe/0x10 [2.206025] [817e2817] take_cpu_down+0x22/0x35 [2.206025] [810f4153] multi_cpu_stop+0x8c/0xe2 [2.206025] [810f40c7] ? cpu_stopper_thread+0x126/0x126 [2.206025] [810f402e] cpu_stopper_thread+0x8d/0x126 [2.206025] [810cdab4] ? lock_acquire+0x94/0x9d [2.206025] [817f25af] ? _raw_spin_unlock_irqrestore+0x40/0x55 [2.206025] [810cbdcd] ? trace_hardirqs_on_caller+0x171/0x18d [2.206025] [817f25b7] ? _raw_spin_unlock_irqrestore+0x48/0x55 [2.206025] [810b8e39] smpboot_thread_fn+0x182/0x1a0 [2.206025] [810b8cb7] ? in_egroup_p+0x2e/0x2e [2.206025] [810b372c] kthread+0xcd/0xd5 [2.206025] [810b365f] ? __kthread_parkme+0x5c/0x5c [2.206025] [817f2f3c] ret_from_fork+0x7c/0xb0 [2.206025] [810b365f] ? __kthread_parkme+0x5c/0x5c [2.206025] Code: 48 c7 c7 65 cd b0 81 e8 43 20 fa ff c6 05 50 e1 c9 00 01 eb 02 31 db 88 d8 5b 5d c3 65 8b 04 25 10 b8 00 00 a9 00 00 0f 00 75 02 0f 0b 55 48 89 e5 e8 b5 fd ff ff 5d c3 55 48 89 e5 53 48 89 fb [2.206025] RIP [810f9318] irq_work_run+0xf/0x1c [2.206025] RSP 8800121a7c48 [2.206025] ---[ end trace f7f1564c3a1f35d0 ]--- [2.206025] note: migration/0[9] exited with preempt_count 1 git bisect start 58ae500a03a6bf68eee323c342431bfdd3f460b6
Re: [PATCH 2/6] irq_work: Implement remote queueing
On 06/25/2014 03:09 PM, Peter Zijlstra wrote: On Wed, Jun 25, 2014 at 11:36:57AM +0200, Peter Zijlstra wrote: On Wed, Jun 25, 2014 at 12:07:05PM +0530, Srivatsa S. Bhat wrote: I don't think irqs_disabled() is the problematic condition, since hotplug_cfg() invokes irq_work_run() from CPU_DYING context (which has irqs disabled). I guess you meant to remove the in_irq() check inside irq_work_run() instead? Yes, clearly I should not get up at 6am.. :-) Let me go do a new one. --- Subject: irq_work: Remove BUG_ON in irq_work_run() From: Peter Zijlstra pet...@infradead.org Date: Wed Jun 25 07:13:07 CEST 2014 Because of a collision with 8d056c48e486 (CPU hotplug, smp: flush any pending IPI callbacks before CPU offline), which ends up calling hotplug_cfd()-flush_smp_call_function_queue()-irq_work_run(), which is not from IRQ context. Cc: Frederic Weisbecker fweis...@gmail.com Reported-by: Stephen Warren swar...@wwwdotorg.org Signed-off-by: Peter Zijlstra pet...@infradead.org Link: http://lkml.kernel.org/n/tip-busatzs2gvz4v62258agi...@git.kernel.org --- kernel/irq_work.c | 12 +--- 1 file changed, 1 insertion(+), 11 deletions(-) Index: linux-2.6/kernel/irq_work.c === --- linux-2.6.orig/kernel/irq_work.c +++ linux-2.6/kernel/irq_work.c @@ -160,21 +160,11 @@ static void irq_work_run_list(struct lli } } -static void __irq_work_run(void) Hmm, irq_work_cpu_notify() calls __irq_work_run() in the CPU_DYING phase, to by-pass BUG_ON(!in_irq()). How about doing the same thing from hotplug_cfd() as well? +static void irq_work_run(void) { irq_work_run_list(__get_cpu_var(raised_list)); irq_work_run_list(__get_cpu_var(lazy_list)); } - -/* - * Run the irq_work entries on this cpu. Requires to be ran from hardirq - * context with local IRQs disabled. - */ -void irq_work_run(void) -{ - BUG_ON(!in_irq()); - __irq_work_run(); -} EXPORT_SYMBOL_GPL(irq_work_run); /* Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/6] irq_work: Implement remote queueing
On 06/25/2014 03:20 PM, Srivatsa S. Bhat wrote: On 06/25/2014 03:09 PM, Peter Zijlstra wrote: On Wed, Jun 25, 2014 at 11:36:57AM +0200, Peter Zijlstra wrote: On Wed, Jun 25, 2014 at 12:07:05PM +0530, Srivatsa S. Bhat wrote: I don't think irqs_disabled() is the problematic condition, since hotplug_cfg() invokes irq_work_run() from CPU_DYING context (which has irqs disabled). I guess you meant to remove the in_irq() check inside irq_work_run() instead? Yes, clearly I should not get up at 6am.. :-) Let me go do a new one. --- Subject: irq_work: Remove BUG_ON in irq_work_run() From: Peter Zijlstra pet...@infradead.org Date: Wed Jun 25 07:13:07 CEST 2014 Because of a collision with 8d056c48e486 (CPU hotplug, smp: flush any pending IPI callbacks before CPU offline), which ends up calling hotplug_cfd()-flush_smp_call_function_queue()-irq_work_run(), which is not from IRQ context. Cc: Frederic Weisbecker fweis...@gmail.com Reported-by: Stephen Warren swar...@wwwdotorg.org Signed-off-by: Peter Zijlstra pet...@infradead.org Link: http://lkml.kernel.org/n/tip-busatzs2gvz4v62258agi...@git.kernel.org --- kernel/irq_work.c | 12 +--- 1 file changed, 1 insertion(+), 11 deletions(-) Index: linux-2.6/kernel/irq_work.c === --- linux-2.6.orig/kernel/irq_work.c +++ linux-2.6/kernel/irq_work.c @@ -160,21 +160,11 @@ static void irq_work_run_list(struct lli } } -static void __irq_work_run(void) Hmm, irq_work_cpu_notify() calls __irq_work_run() in the CPU_DYING phase, to by-pass BUG_ON(!in_irq()). How about doing the same thing from hotplug_cfd() as well? Wait, that was a stupid idea. hotplug_cfd() already invokes irq_work_run indirectly via flush_smp_call_function_queue(). So irq_work_cpu_notify() doesn't need to invoke it again, AFAIU. So perhaps we can get rid of irq_work_cpu_notify() altogether? Regards, Srivatsa S. Bhat +static void irq_work_run(void) { irq_work_run_list(__get_cpu_var(raised_list)); irq_work_run_list(__get_cpu_var(lazy_list)); } - -/* - * Run the irq_work entries on this cpu. Requires to be ran from hardirq - * context with local IRQs disabled. - */ -void irq_work_run(void) -{ -BUG_ON(!in_irq()); -__irq_work_run(); -} EXPORT_SYMBOL_GPL(irq_work_run); /* -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/6] irq_work: Implement remote queueing
On 06/25/2014 03:49 PM, Peter Zijlstra wrote: On Wed, Jun 25, 2014 at 03:24:11PM +0530, Srivatsa S. Bhat wrote: Wait, that was a stupid idea. hotplug_cfd() already invokes irq_work_run indirectly via flush_smp_call_function_queue(). So irq_work_cpu_notify() doesn't need to invoke it again, AFAIU. So perhaps we can get rid of irq_work_cpu_notify() altogether? Just so... getting up at 6am and sitting in an airport terminal doesn't seem to agree with me; Haha :-) any more silly fail here? A few minor nits below.. --- Subject: irq_work: Remove BUG_ON in irq_work_run() From: Peter Zijlstra pet...@infradead.org Date: Wed Jun 25 07:13:07 CEST 2014 Because of a collision with 8d056c48e486 (CPU hotplug, smp: flush any pending IPI callbacks before CPU offline), which ends up calling hotplug_cfd()-flush_smp_call_function_queue()-irq_work_run(), which is not from IRQ context. And since that already calls irq_work_run() from the hotplug path, remove our entire hotplug handling. Cc: Frederic Weisbecker fweis...@gmail.com Reported-by: Stephen Warren swar...@wwwdotorg.org Signed-off-by: Peter Zijlstra pet...@infradead.org Link: http://lkml.kernel.org/n/tip-busatzs2gvz4v62258agi...@git.kernel.org --- kernel/irq_work.c | 48 +--- 1 file changed, 5 insertions(+), 43 deletions(-) Index: linux-2.6/kernel/irq_work.c === --- linux-2.6.orig/kernel/irq_work.c +++ linux-2.6/kernel/irq_work.c @@ -160,20 +160,14 @@ static void irq_work_run_list(struct lli } } -static void __irq_work_run(void) -{ - irq_work_run_list(__get_cpu_var(raised_list)); - irq_work_run_list(__get_cpu_var(lazy_list)); -} - /* - * Run the irq_work entries on this cpu. Requires to be ran from hardirq - * context with local IRQs disabled. + * hotplug calls this through: + * hotplug_cfs() - flush_smp_call_function_queue() s/hotplug_cfs/hotplug_cfd */ -void irq_work_run(void) +static void irq_work_run(void) s/static// { - BUG_ON(!in_irq()); - __irq_work_run(); + irq_work_run_list(__get_cpu_var(raised_list)); + irq_work_run_list(__get_cpu_var(lazy_list)); } EXPORT_SYMBOL_GPL(irq_work_run); With those 2 changes, everything looks good to me. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/6] irq_work: Implement remote queueing
On 06/25/2014 10:08 PM, Peter Zijlstra wrote: On Wed, Jun 25, 2014 at 10:23:21AM -0600, Stephen Warren wrote: On 06/25/2014 04:19 AM, Peter Zijlstra wrote: On Wed, Jun 25, 2014 at 03:24:11PM +0530, Srivatsa S. Bhat wrote: Wait, that was a stupid idea. hotplug_cfd() already invokes irq_work_run indirectly via flush_smp_call_function_queue(). So irq_work_cpu_notify() doesn't need to invoke it again, AFAIU. So perhaps we can get rid of irq_work_cpu_notify() altogether? Just so... getting up at 6am and sitting in an airport terminal doesn't seem to agree with me; any more silly fail here? --- Subject: irq_work: Remove BUG_ON in irq_work_run() From: Peter Zijlstra pet...@infradead.org Date: Wed Jun 25 07:13:07 CEST 2014 Because of a collision with 8d056c48e486 (CPU hotplug, smp: flush any pending IPI callbacks before CPU offline), which ends up calling hotplug_cfd()-flush_smp_call_function_queue()-irq_work_run(), which is not from IRQ context. And since that already calls irq_work_run() from the hotplug path, remove our entire hotplug handling. Tested-by: Stephen Warren swar...@nvidia.com [with the s/static// already mentioned in this thread, obviously:-)] Right; I pushed out a fixed version right before loosing my tubes at the airport :-) https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git/commit/?h=timers/nohzid=921d8b81281ecdca686369f52165d04fa3505bd7 This version looks good. Reviewed-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com Regards, Srivatsa S. Bhat I've not gotten wu build bot spam on it so it must be good ;-) In any case, I'll add your tested-by and update later this evening. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 2/2] CPU hotplug, smp: Flush any pending IPI callbacks before CPU offline
On 06/25/2014 09:12 PM, Sasha Levin wrote: On 05/26/2014 07:08 AM, Srivatsa S. Bhat wrote: During CPU offline, in stop-machine, we don't enforce any rule in the _DISABLE_IRQ stage, regarding the order in which the outgoing CPU and the other CPUs disable their local interrupts. Hence, we can encounter a scenario as depicted below, in which IPIs are sent by the other CPUs to the CPU going offline (while it is *still* online), but the outgoing CPU notices them only *after* it has gone offline. [...] Hi all, While fuzzing with trinity inside a KVM tools guest running the latest -next kernel I've stumbled on the following spew: Thanks for the bug report. Please test if this patch fixes the problem for you: https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git/commit/?h=timers/nohzid=921d8b81281ecdca686369f52165d04fa3505bd7 Regards, Srivatsa S. Bhat [ 1982.600053] kernel BUG at kernel/irq_work.c:175! [ 1982.600053] invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 1982.600053] Dumping ftrace buffer: [ 1982.600053](ftrace buffer empty) [ 1982.600053] Modules linked in: [ 1982.600053] CPU: 14 PID: 168 Comm: migration/14 Not tainted 3.16.0-rc2-next-20140624-sasha-00024-g332b58d #726 [ 1982.600053] task: 88036a5a3000 ti: 88036a5ac000 task.ti: 88036a5ac000 [ 1982.600053] RIP: irq_work_run (kernel/irq_work.c:175 (discriminator 1)) [ 1982.600053] RSP: :88036a5afbe0 EFLAGS: 00010046 [ 1982.600053] RAX: 8001 RBX: RCX: 0008 [ 1982.600053] RDX: 000e RSI: af9185fb RDI: [ 1982.600053] RBP: 88036a5afc08 R08: 00099224 R09: [ 1982.600053] R10: R11: 0001 R12: 88036afd8400 [ 1982.600053] R13: R14: b0cf8120 R15: b0cce5d0 [ 1982.600053] FS: () GS:88036ae0() knlGS: [ 1982.600053] CS: 0010 DS: ES: CR0: 8005003b [ 1982.600053] CR2: 019485d0 CR3: 0002c7c8f000 CR4: 06a0 [ 1982.600053] Stack: [ 1982.600053] ab20fbb5 0082 88036afd8440 [ 1982.600053] 0001 88036a5afc28 ab20fca7 [ 1982.600053] ffef 88036a5afc78 ab19c58e 000e [ 1982.600053] Call Trace: [ 1982.600053] ? flush_smp_call_function_queue (kernel/smp.c:263) [ 1982.600053] hotplug_cfd (kernel/smp.c:81) [ 1982.600053] notifier_call_chain (kernel/notifier.c:95) [ 1982.600053] __raw_notifier_call_chain (kernel/notifier.c:395) [ 1982.600053] __cpu_notify (kernel/cpu.c:202) [ 1982.600053] cpu_notify (kernel/cpu.c:211) [ 1982.600053] take_cpu_down (./arch/x86/include/asm/current.h:14 kernel/cpu.c:312) [ 1982.600053] multi_cpu_stop (kernel/stop_machine.c:201) [ 1982.600053] ? __stop_cpus (kernel/stop_machine.c:170) [ 1982.600053] cpu_stopper_thread (kernel/stop_machine.c:474) [ 1982.600053] ? put_lock_stats.isra.12 (./arch/x86/include/asm/preempt.h:98 kernel/locking/lockdep.c:254) [ 1982.600053] ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/paravirt.h:809 include/linux/spinlock_api_smp.h:160 kernel/locking/spinlock.c:191) [ 1982.600053] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63) [ 1982.600053] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2557 kernel/locking/lockdep.c:2599) [ 1982.600053] smpboot_thread_fn (kernel/smpboot.c:160) [ 1982.600053] ? __smpboot_create_thread (kernel/smpboot.c:105) [ 1982.600053] kthread (kernel/kthread.c:210) [ 1982.600053] ? wait_for_completion (kernel/sched/completion.c:77 kernel/sched/completion.c:93 kernel/sched/completion.c:101 kernel/sched/completion.c:122) [ 1982.600053] ? kthread_create_on_node (kernel/kthread.c:176) [ 1982.600053] ret_from_fork (arch/x86/kernel/entry_64.S:349) [ 1982.600053] ? kthread_create_on_node (kernel/kthread.c:176) [ 1982.600053] Code: 00 00 00 00 e8 63 ff ff ff 48 83 c4 08 b8 01 00 00 00 5b 5d c3 b8 01 00 00 00 c3 90 65 8b 04 25 a0 da 00 00 a9 00 00 0f 00 75 09 0f 0b 0f 1f 80 00 00 00 00 55 48 89 e5 e8 2f ff ff ff 5d c3 66 All code 0: 00 00 add%al,(%rax) 2: 00 00 add%al,(%rax) 4: e8 63 ff ff ff callq 0xff6c 9: 48 83 c4 08 add$0x8,%rsp d: b8 01 00 00 00 mov$0x1,%eax 12: 5b pop%rbx 13: 5d pop%rbp 14: c3 retq 15: b8 01 00 00 00 mov$0x1,%eax 1a: c3 retq 1b: 90 nop 1c: 65 8b 04 25 a0 da 00mov%gs:0xdaa0,%eax 23: 00 24: a9 00 00 0f 00 test $0xf,%eax 29: 75 09 jne0x34 2b:*0f 0b ud2 -- trapping instruction 2d: 0f 1f 80 00 00
Re: Boot warnings on exynos5420 based boards
On 06/17/2014 05:25 PM, Sachin Kamat wrote: > Hi Srivatsa, > > On Tue, Jun 17, 2014 at 3:24 PM, Srivatsa S. Bhat > wrote: >> On 06/17/2014 03:03 PM, Sachin Kamat wrote: >> >>>> Below is an updated patch, please let me know how it goes. (You'll have to >>>> revert c47a9d7cca first, and then 56e692182, before trying this patch). >>> >>> I am unable to apply your below patch on top of the above 2 reverts. >>> Applying: CPU hotplug, smp: Execute any pending IPI callbacks before CPU >>> offline >>> fatal: corrupt patch at line 106 >>> Patch failed at 0001 CPU hotplug, smp: Execute any pending IPI >>> callbacks before CPU offline >>> >>> Even with 'patch' I get the below failures: >>> patching file kernel/smp.c >>> Hunk #2 FAILED at 53. >>> Hunk #3 FAILED at 179. >>> 2 out of 3 hunks FAILED -- saving rejects to file kernel/smp.c.rej >>> >> >> Hmm, weird. My mailer must have screwed it up. >> >> Let's try again: >> >> [In case this also doesn't work for you, please use this git tree in which >> I have reverted the 2 old commits and added this updated patch. >> >> git://github.com/srivatsabhat/linux.git ipi-offline-fix-v3 >> ] > > Unfortunately the attached patch did not apply either. Nevertheless, I > applied the > patch from your above mentioned tree. With that patch I do not see the > warnings > that I mentioned in my first mail. Thanks for fixing it. > Sure, thanks for reporting the bug and testing the updated patch! By the way, I think there is some problem in the workflow that you use to copy-paste/apply the patch. I tried applying both patches (that I sent in 2 different mails) and both applied properly without any problems. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Boot warnings on exynos5420 based boards
On 06/17/2014 03:03 PM, Sachin Kamat wrote: >> Below is an updated patch, please let me know how it goes. (You'll have to >> revert c47a9d7cca first, and then 56e692182, before trying this patch). > > I am unable to apply your below patch on top of the above 2 reverts. > Applying: CPU hotplug, smp: Execute any pending IPI callbacks before CPU > offline > fatal: corrupt patch at line 106 > Patch failed at 0001 CPU hotplug, smp: Execute any pending IPI > callbacks before CPU offline > > Even with 'patch' I get the below failures: > patching file kernel/smp.c > Hunk #2 FAILED at 53. > Hunk #3 FAILED at 179. > 2 out of 3 hunks FAILED -- saving rejects to file kernel/smp.c.rej > Hmm, weird. My mailer must have screwed it up. Let's try again: [In case this also doesn't work for you, please use this git tree in which I have reverted the 2 old commits and added this updated patch. git://github.com/srivatsabhat/linux.git ipi-offline-fix-v3 ] -------- From: Srivatsa S. Bhat [PATCH] CPU hotplug, smp: Execute any pending IPI callbacks before CPU offline There is a race between the CPU offline code (within stop-machine) and the smp-call-function code, which can lead to getting IPIs on the outgoing CPU, *after* it has gone offline. Specifically, this can happen when using smp_call_function_single_async() to send the IPI, since this API allows sending asynchronous IPIs from IRQ disabled contexts. The exact race condition is described below. During CPU offline, in stop-machine, we don't enforce any rule in the _DISABLE_IRQ stage, regarding the order in which the outgoing CPU and the other CPUs disable their local interrupts. Due to this, we can encounter a situation in which an IPI is sent by one of the other CPUs to the outgoing CPU (while it is *still* online), but the outgoing CPU ends up noticing it only *after* it has gone offline. CPU 1 CPU 2 (Online CPU) (CPU going offline) Enter _PREPARE stage Enter _PREPARE stage Enter _DISABLE_IRQ stage = Got a device interrupt, and | Didn't notice the IPI the interrupt handler sent an | since interrupts were IPI to CPU 2 using | disabled on this CPU. smp_call_function_single_async()| = Enter _DISABLE_IRQ stage Enter _RUN stage Enter _RUN stage = Busy loop with interrupts | Invoke take_cpu_down() disabled. | and take CPU 2 offline = Enter _EXIT stage Enter _EXIT stage Re-enable interrupts Re-enable interrupts The pending IPI is noted immediately, but alas, the CPU is offline at this point. This of course, makes the smp-call-function IPI handler code running on CPU 2 unhappy and it complains about "receiving an IPI on an offline CPU". One real example of the scenario on CPU 1 is the block layer's complete-request call-path: __blk_complete_request() [interrupt-handler] raise_blk_irq() smp_call_function_single_async() However, if we look closely, the block layer does check that the target CPU is online before firing the IPI. So in this case, it is actually the unfortunate ordering/timing of events in the stop-machine phase that leads to receiving IPIs after the target CPU has gone offline. In reality, getting a late IPI on an offline CPU is not too bad by itself (this can happen even due to hardware latencies in IPI send-receive). It is a bug only if the target CPU really went offline without executing all the callbacks queued on its list. (Note that a CPU is free to execute its pending smp-call-function callbacks in a batch, without waiting for the corresponding IPIs to arrive for each one of those callbacks). So, fixing this issue can be broken up into two parts: 1. Ensure that a CPU goes offline only after executing all the callbacks queued on it. 2. Modify the warning condition in the smp-call-function IPI handler code such that it warns only if an offline CPU got an IPI *and* that CPU had gone offline with callbacks still pending in its queue. Achieving part 1 is straight-forward - just flush (execute) all the queued callbacks on the outgoing CP
Re: Boot warnings on exynos5420 based boards
Hi Sachin, On 06/17/2014 01:39 PM, Sachin Kamat wrote: > Hi, > > I observe the below warnings while trying to boot Exynos5420 based boards > since yesterday's linux-next (next-20140616) using multi_v7_defconfig. Looks I guess you meant next-20140617. > like it is triggered by the commit 56e6921829 ("CPU hotplug, smp: > flush any pending IPI callbacks before CPU offline"). Any ideas? > > > * > [0.046521] Exynos MCPM support installed > [0.048939] CPU1: Booted secondary processor > [0.065005] CPU1: update cpu_capacity 1535 > [0.065011] CPU1: thread -1, cpu 1, socket 0, mpidr 8001 > [0.065660] CPU2: Booted secondary processor > [0.085005] CPU2: update cpu_capacity 1535 > [0.085012] CPU2: thread -1, cpu 2, socket 0, mpidr 8002 > [0.085662] CPU3: Booted secondary processor > [0.105005] CPU3: update cpu_capacity 1535 > [0.105011] CPU3: thread -1, cpu 3, socket 0, mpidr 8003 > [1.105031] CPU4: failed to come online > [1.105081] [ cut here ] > [1.105104] WARNING: CPU: 0 PID: 1 at kernel/smp.c:228 > flush_smp_call_function_queue+0xc0/0x178() > [1.105112] Modules linked in: > [1.105129] CPU: 0 PID: 1 Comm: swapper/0 Not tainted > 3.15.0-next-20140616-2-g38f9385a061b #2035 > [1.105157] [] (unwind_backtrace) from [] > (show_stack+0x10/0x14) > [1.105179] [] (show_stack) from [] > (dump_stack+0x8c/0x9c) > [1.105198] [] (dump_stack) from [] > (warn_slowpath_common+0x70/0x8c) > [1.105216] [] (warn_slowpath_common) from [] > (warn_slowpath_null+0x1c/0x24) > [1.105235] [] (warn_slowpath_null) from [] > (flush_smp_call_function_queue+0xc0/0x178) > [1.105253] [] (flush_smp_call_function_queue) from > [] (hotplug_cfd+0x98/0xd8) > [1.105269] [] (hotplug_cfd) from [] > (notifier_call_chain+0x44/0x84) > [1.105285] [] (notifier_call_chain) from [] > (_cpu_up+0x120/0x170) > [1.105302] [] (_cpu_up) from [] (cpu_up+0x70/0x94) > [1.105319] [] (cpu_up) from [] (smp_init+0xac/0xb0) > [1.105337] [] (smp_init) from [] > (kernel_init_freeable+0x90/0x1dc) > [1.105353] [] (kernel_init_freeable) from [] > (kernel_init+0xc/0xe8) > [1.105368] [] (kernel_init) from [] > (ret_from_fork+0x14/0x3c) > [1.105389] ---[ end trace bc66942e4ab63168 ]--- Argh! I had put the switch-case handling for CPU_DYING at the 'wrong' place, since I hadn't noticed that CPU_UP_CANCELED silently falls-through to CPU_DEAD. This is what happens when people don't explicitly write "fall-through" in the comments in a switch-case statement :-( Below is an updated patch, please let me know how it goes. (You'll have to revert c47a9d7cca first, and then 56e692182, before trying this patch). [c47a9d7cca - CPU hotplug, smp: Execute any pending IPI callbacks before CPU offline] [56e692182 - CPU hotplug, smp: flush any pending IPI callbacks before CPU offline] Andrew, can you please use this patch instead? Thanks a lot! -- From: Srivatsa S. Bhat [PATCH] CPU hotplug, smp: Execute any pending IPI callbacks before CPU offline There is a race between the CPU offline code (within stop-machine) and the smp-call-function code, which can lead to getting IPIs on the outgoing CPU, *after* it has gone offline. Specifically, this can happen when using smp_call_function_single_async() to send the IPI, since this API allows sending asynchronous IPIs from IRQ disabled contexts. The exact race condition is described below. During CPU offline, in stop-machine, we don't enforce any rule in the _DISABLE_IRQ stage, regarding the order in which the outgoing CPU and the other CPUs disable their local interrupts. Due to this, we can encounter a situation in which an IPI is sent by one of the other CPUs to the outgoing CPU (while it is *still* online), but the outgoing CPU ends up noticing it only *after* it has gone offline. CPU 1 CPU 2 (Online CPU) (CPU going offline) Enter _PREPARE stage Enter _PREPARE stage Enter _DISABLE_IRQ stage = Got a device interrupt, and | Didn't notice the IPI the interrupt handler sent an | since interrupts were IPI to CPU 2 using | disabled on this CPU. smp_call_function_single_async()| = Enter _DISABLE_IRQ stage Enter _RUN stage Enter _RUN stage
Re: Boot warnings on exynos5420 based boards
Hi Sachin, On 06/17/2014 01:39 PM, Sachin Kamat wrote: Hi, I observe the below warnings while trying to boot Exynos5420 based boards since yesterday's linux-next (next-20140616) using multi_v7_defconfig. Looks I guess you meant next-20140617. like it is triggered by the commit 56e6921829 (CPU hotplug, smp: flush any pending IPI callbacks before CPU offline). Any ideas? * [0.046521] Exynos MCPM support installed [0.048939] CPU1: Booted secondary processor [0.065005] CPU1: update cpu_capacity 1535 [0.065011] CPU1: thread -1, cpu 1, socket 0, mpidr 8001 [0.065660] CPU2: Booted secondary processor [0.085005] CPU2: update cpu_capacity 1535 [0.085012] CPU2: thread -1, cpu 2, socket 0, mpidr 8002 [0.085662] CPU3: Booted secondary processor [0.105005] CPU3: update cpu_capacity 1535 [0.105011] CPU3: thread -1, cpu 3, socket 0, mpidr 8003 [1.105031] CPU4: failed to come online [1.105081] [ cut here ] [1.105104] WARNING: CPU: 0 PID: 1 at kernel/smp.c:228 flush_smp_call_function_queue+0xc0/0x178() [1.105112] Modules linked in: [1.105129] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-next-20140616-2-g38f9385a061b #2035 [1.105157] [c02160f0] (unwind_backtrace) from [c0211c8c] (show_stack+0x10/0x14) [1.105179] [c0211c8c] (show_stack) from [c0853794] (dump_stack+0x8c/0x9c) [1.105198] [c0853794] (dump_stack) from [c024bdf4] (warn_slowpath_common+0x70/0x8c) [1.105216] [c024bdf4] (warn_slowpath_common) from [c024beac] (warn_slowpath_null+0x1c/0x24) [1.105235] [c024beac] (warn_slowpath_null) from [c02a3944] (flush_smp_call_function_queue+0xc0/0x178) [1.105253] [c02a3944] (flush_smp_call_function_queue) from [c02a3a94] (hotplug_cfd+0x98/0xd8) [1.105269] [c02a3a94] (hotplug_cfd) from [c026b064] (notifier_call_chain+0x44/0x84) [1.105285] [c026b064] (notifier_call_chain) from [c024c1a4] (_cpu_up+0x120/0x170) [1.105302] [c024c1a4] (_cpu_up) from [c024c264] (cpu_up+0x70/0x94) [1.105319] [c024c264] (cpu_up) from [c0b5839c] (smp_init+0xac/0xb0) [1.105337] [c0b5839c] (smp_init) from [c0b2fc54] (kernel_init_freeable+0x90/0x1dc) [1.105353] [c0b2fc54] (kernel_init_freeable) from [c0851248] (kernel_init+0xc/0xe8) [1.105368] [c0851248] (kernel_init) from [c020e7f8] (ret_from_fork+0x14/0x3c) [1.105389] ---[ end trace bc66942e4ab63168 ]--- Argh! I had put the switch-case handling for CPU_DYING at the 'wrong' place, since I hadn't noticed that CPU_UP_CANCELED silently falls-through to CPU_DEAD. This is what happens when people don't explicitly write fall-through in the comments in a switch-case statement :-( Below is an updated patch, please let me know how it goes. (You'll have to revert c47a9d7cca first, and then 56e692182, before trying this patch). [c47a9d7cca - CPU hotplug, smp: Execute any pending IPI callbacks before CPU offline] [56e692182 - CPU hotplug, smp: flush any pending IPI callbacks before CPU offline] Andrew, can you please use this patch instead? Thanks a lot! -- From: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com [PATCH] CPU hotplug, smp: Execute any pending IPI callbacks before CPU offline There is a race between the CPU offline code (within stop-machine) and the smp-call-function code, which can lead to getting IPIs on the outgoing CPU, *after* it has gone offline. Specifically, this can happen when using smp_call_function_single_async() to send the IPI, since this API allows sending asynchronous IPIs from IRQ disabled contexts. The exact race condition is described below. During CPU offline, in stop-machine, we don't enforce any rule in the _DISABLE_IRQ stage, regarding the order in which the outgoing CPU and the other CPUs disable their local interrupts. Due to this, we can encounter a situation in which an IPI is sent by one of the other CPUs to the outgoing CPU (while it is *still* online), but the outgoing CPU ends up noticing it only *after* it has gone offline. CPU 1 CPU 2 (Online CPU) (CPU going offline) Enter _PREPARE stage Enter _PREPARE stage Enter _DISABLE_IRQ stage = Got a device interrupt, and | Didn't notice the IPI the interrupt handler sent an | since interrupts were IPI to CPU 2 using | disabled on this CPU. smp_call_function_single_async()| = Enter _DISABLE_IRQ stage Enter _RUN stage Enter _RUN
Re: Boot warnings on exynos5420 based boards
On 06/17/2014 03:03 PM, Sachin Kamat wrote: Below is an updated patch, please let me know how it goes. (You'll have to revert c47a9d7cca first, and then 56e692182, before trying this patch). I am unable to apply your below patch on top of the above 2 reverts. Applying: CPU hotplug, smp: Execute any pending IPI callbacks before CPU offline fatal: corrupt patch at line 106 Patch failed at 0001 CPU hotplug, smp: Execute any pending IPI callbacks before CPU offline Even with 'patch' I get the below failures: patching file kernel/smp.c Hunk #2 FAILED at 53. Hunk #3 FAILED at 179. 2 out of 3 hunks FAILED -- saving rejects to file kernel/smp.c.rej Hmm, weird. My mailer must have screwed it up. Let's try again: [In case this also doesn't work for you, please use this git tree in which I have reverted the 2 old commits and added this updated patch. git://github.com/srivatsabhat/linux.git ipi-offline-fix-v3 ] From: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com [PATCH] CPU hotplug, smp: Execute any pending IPI callbacks before CPU offline There is a race between the CPU offline code (within stop-machine) and the smp-call-function code, which can lead to getting IPIs on the outgoing CPU, *after* it has gone offline. Specifically, this can happen when using smp_call_function_single_async() to send the IPI, since this API allows sending asynchronous IPIs from IRQ disabled contexts. The exact race condition is described below. During CPU offline, in stop-machine, we don't enforce any rule in the _DISABLE_IRQ stage, regarding the order in which the outgoing CPU and the other CPUs disable their local interrupts. Due to this, we can encounter a situation in which an IPI is sent by one of the other CPUs to the outgoing CPU (while it is *still* online), but the outgoing CPU ends up noticing it only *after* it has gone offline. CPU 1 CPU 2 (Online CPU) (CPU going offline) Enter _PREPARE stage Enter _PREPARE stage Enter _DISABLE_IRQ stage = Got a device interrupt, and | Didn't notice the IPI the interrupt handler sent an | since interrupts were IPI to CPU 2 using | disabled on this CPU. smp_call_function_single_async()| = Enter _DISABLE_IRQ stage Enter _RUN stage Enter _RUN stage = Busy loop with interrupts | Invoke take_cpu_down() disabled. | and take CPU 2 offline = Enter _EXIT stage Enter _EXIT stage Re-enable interrupts Re-enable interrupts The pending IPI is noted immediately, but alas, the CPU is offline at this point. This of course, makes the smp-call-function IPI handler code running on CPU 2 unhappy and it complains about receiving an IPI on an offline CPU. One real example of the scenario on CPU 1 is the block layer's complete-request call-path: __blk_complete_request() [interrupt-handler] raise_blk_irq() smp_call_function_single_async() However, if we look closely, the block layer does check that the target CPU is online before firing the IPI. So in this case, it is actually the unfortunate ordering/timing of events in the stop-machine phase that leads to receiving IPIs after the target CPU has gone offline. In reality, getting a late IPI on an offline CPU is not too bad by itself (this can happen even due to hardware latencies in IPI send-receive). It is a bug only if the target CPU really went offline without executing all the callbacks queued on its list. (Note that a CPU is free to execute its pending smp-call-function callbacks in a batch, without waiting for the corresponding IPIs to arrive for each one of those callbacks). So, fixing this issue can be broken up into two parts: 1. Ensure that a CPU goes offline only after executing all the callbacks queued on it. 2. Modify the warning condition in the smp-call-function IPI handler code such that it warns only if an offline CPU got an IPI *and* that CPU had gone offline with callbacks still pending in its queue. Achieving part 1 is straight-forward - just flush (execute) all the queued callbacks on the outgoing CPU in the CPU_DYING stage[1], including those callbacks
Re: Boot warnings on exynos5420 based boards
On 06/17/2014 05:25 PM, Sachin Kamat wrote: Hi Srivatsa, On Tue, Jun 17, 2014 at 3:24 PM, Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com wrote: On 06/17/2014 03:03 PM, Sachin Kamat wrote: Below is an updated patch, please let me know how it goes. (You'll have to revert c47a9d7cca first, and then 56e692182, before trying this patch). I am unable to apply your below patch on top of the above 2 reverts. Applying: CPU hotplug, smp: Execute any pending IPI callbacks before CPU offline fatal: corrupt patch at line 106 Patch failed at 0001 CPU hotplug, smp: Execute any pending IPI callbacks before CPU offline Even with 'patch' I get the below failures: patching file kernel/smp.c Hunk #2 FAILED at 53. Hunk #3 FAILED at 179. 2 out of 3 hunks FAILED -- saving rejects to file kernel/smp.c.rej Hmm, weird. My mailer must have screwed it up. Let's try again: [In case this also doesn't work for you, please use this git tree in which I have reverted the 2 old commits and added this updated patch. git://github.com/srivatsabhat/linux.git ipi-offline-fix-v3 ] Unfortunately the attached patch did not apply either. Nevertheless, I applied the patch from your above mentioned tree. With that patch I do not see the warnings that I mentioned in my first mail. Thanks for fixing it. Sure, thanks for reporting the bug and testing the updated patch! By the way, I think there is some problem in the workflow that you use to copy-paste/apply the patch. I tried applying both patches (that I sent in 2 different mails) and both applied properly without any problems. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CPU hotplug, smp] WARNING: CPU: 1 PID: 0 at kernel/smp.c:209 generic_smp_call_function_single_interrupt()
On 06/15/2014 12:56 PM, Jet Chen wrote: > Hi Srivatsa, > > 0day kernel testing robot got the below dmesg and the first bad commit is > > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master > commit ab7a42783d939cdbe729c18ab32dbf0d25746ea2 > Author: Srivatsa S. Bhat > AuthorDate: Thu May 22 10:44:06 2014 +1000 > Commit: Stephen Rothwell > CommitDate: Thu May 22 10:44:06 2014 +1000 > > CPU hotplug, smp: Flush any pending IPI callbacks before CPU offline > During CPU offline, in the stop-machine loop, we use 2 separate > stages to > disable interrupts, to ensure that the CPU going offline doesn't get any > new IPIs from the other CPUs after it has gone offline. > However, an IPI sent much earlier might arrive late on the target CPU > (possibly _after_ the CPU has gone offline) due to hardware latencies, and > due to this, the smp-call-function callbacks queued on the outgoing CPU > might not get noticed (and hence not executed) at all. > This is somewhat theoretical, but in any case, it makes sense to > explicitly loop through the call_single_queue and flush any pending > callbacks before the CPU goes completely offline. So, flush the queued > smp-call-function callbacks in the MULTI_STOP_DISABLE_IRQ_ACTIVE stage, > after disabling interrupts on the active CPU. This can be trivially > achieved by invoking the generic_smp_call_function_single_interrupt() > function itself (and since the outgoing CPU is still online at this point, > we won't trigger the "IPI to offline CPU" warning in this function; so we > are safe to call it here). > This way, we would have handled all the queued callbacks before going > offline, and also, no new IPIs can be sent by the other CPUs to the > outgoing CPU at that point, because they will all be executing the > stop-machine code with interrupts disabled. > Signed-off-by: Srivatsa S. Bhat > Suggested-by: Frederic Weisbecker > Reviewed-by: Tejun Heo > Cc: Peter Zijlstra > Cc: Oleg Nesterov > Signed-off-by: Andrew Morton > Thanks for reporting this, but this patch has been superseded by an updated version of the patch and you can find that here: https://lkml.org/lkml/2014/6/10/589 Also, this (bad) patch (which you bisected to) is not in linux-next at the moment. So I guess you tested a slightly old version of linux-next. Thank you! Regards, Srivatsa S. Bhat > > +--+++ > | > | c80d40e1f2 | ab7a42783d | > +--+++ > | boot_successes > | 1194 | 261| > | boot_failures > | 6 | 39 | > | BUG:kernel_test_crashed > | 6 || > | > WARNING:CPU:PID:at_kernel/smp.c:generic_smp_call_function_single_interrupt() > | 0 | 39 | > | backtrace:stop_machine_from_inactive_cpu > | 0 | 39 | > | backtrace:mtrr_ap_init > | 0 | 39 | > | general_protection_fault > | 0 | 0 | > | RIP:__lock_acquire > | 0 | 0 | > | BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/rwsem.c > | 0 | 0 | > | INFO:lockdep_is_turned_off > | 0 | 0 | > | backtrace:do_mount > | 0 | 0 | > | backtrace:SyS_mount > | 0 | 0 | > +--+++ > > [ 62.119017] masked ExtINT on CPU#1 > [ 62.119017] numa_add_cpu cpu 1 node 0: mask now 0-1 > [ 62.261606] [ cut here ] > [ 62.261606] WARNING: CPU: 1 PID: 0 at kernel/smp.c:209 > generic_smp_call_function_single_interrupt+0xc5/0x155() > [ 62.261606] IPI on offline CPU 1 > [ 62.261606] Modules linked in: > [ 62.261606] CPU: 1 PID: 0 Comm: swapper/1 Not tainted > 3.15.0-rc5-01473-gab7a427 #4510 > [
Re: [CPU hotplug, smp] WARNING: CPU: 1 PID: 0 at kernel/smp.c:209 generic_smp_call_function_single_interrupt()
On 06/15/2014 12:56 PM, Jet Chen wrote: Hi Srivatsa, 0day kernel testing robot got the below dmesg and the first bad commit is git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master commit ab7a42783d939cdbe729c18ab32dbf0d25746ea2 Author: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com AuthorDate: Thu May 22 10:44:06 2014 +1000 Commit: Stephen Rothwell s...@canb.auug.org.au CommitDate: Thu May 22 10:44:06 2014 +1000 CPU hotplug, smp: Flush any pending IPI callbacks before CPU offline During CPU offline, in the stop-machine loop, we use 2 separate stages to disable interrupts, to ensure that the CPU going offline doesn't get any new IPIs from the other CPUs after it has gone offline. However, an IPI sent much earlier might arrive late on the target CPU (possibly _after_ the CPU has gone offline) due to hardware latencies, and due to this, the smp-call-function callbacks queued on the outgoing CPU might not get noticed (and hence not executed) at all. This is somewhat theoretical, but in any case, it makes sense to explicitly loop through the call_single_queue and flush any pending callbacks before the CPU goes completely offline. So, flush the queued smp-call-function callbacks in the MULTI_STOP_DISABLE_IRQ_ACTIVE stage, after disabling interrupts on the active CPU. This can be trivially achieved by invoking the generic_smp_call_function_single_interrupt() function itself (and since the outgoing CPU is still online at this point, we won't trigger the IPI to offline CPU warning in this function; so we are safe to call it here). This way, we would have handled all the queued callbacks before going offline, and also, no new IPIs can be sent by the other CPUs to the outgoing CPU at that point, because they will all be executing the stop-machine code with interrupts disabled. Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com Suggested-by: Frederic Weisbecker fweis...@gmail.com Reviewed-by: Tejun Heo t...@kernel.org Cc: Peter Zijlstra pet...@infradead.org Cc: Oleg Nesterov o...@redhat.com Signed-off-by: Andrew Morton a...@linux-foundation.org Thanks for reporting this, but this patch has been superseded by an updated version of the patch and you can find that here: https://lkml.org/lkml/2014/6/10/589 Also, this (bad) patch (which you bisected to) is not in linux-next at the moment. So I guess you tested a slightly old version of linux-next. Thank you! Regards, Srivatsa S. Bhat +--+++ | | c80d40e1f2 | ab7a42783d | +--+++ | boot_successes | 1194 | 261| | boot_failures | 6 | 39 | | BUG:kernel_test_crashed | 6 || | WARNING:CPU:PID:at_kernel/smp.c:generic_smp_call_function_single_interrupt() | 0 | 39 | | backtrace:stop_machine_from_inactive_cpu | 0 | 39 | | backtrace:mtrr_ap_init | 0 | 39 | | general_protection_fault | 0 | 0 | | RIP:__lock_acquire | 0 | 0 | | BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/rwsem.c | 0 | 0 | | INFO:lockdep_is_turned_off | 0 | 0 | | backtrace:do_mount | 0 | 0 | | backtrace:SyS_mount | 0 | 0 | +--+++ [ 62.119017] masked ExtINT on CPU#1 [ 62.119017] numa_add_cpu cpu 1 node 0: mask now 0-1 [ 62.261606] [ cut here ] [ 62.261606] WARNING: CPU: 1 PID: 0 at kernel/smp.c:209 generic_smp_call_function_single_interrupt+0xc5/0x155() [ 62.261606] IPI on offline CPU 1 [ 62.261606] Modules linked in: [ 62.261606] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.15.0-rc5-01473-gab7a427 #4510 [ 62.261606] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 62.261606] 88001e7e1d28 81b01705 88001e7e1d70 [ 62.261606
Re: [PATCH] powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode
Hi Joel, On 06/12/2014 12:09 PM, Joel Stanley wrote: > Hi Srivatsa, > > On Sat, Jun 7, 2014 at 7:16 AM, Srivatsa S. Bhat > wrote: >> And with the following hunk added (which I had forgotten earlier), it worked >> just >> fine on powernv :-) > > How are the patches coming along? > I'm still waiting to test this patch series on a PowerVM box, and unfortunately there are some machine issues to debug first :-( So that's why this is taking time... :-( > I just hung a machine here while attempting to kexec. It appears to > have onlined all of the secondary threads, and then hung here: > > kexec: Waking offline cpu 1. > kvm: enabling virtualization on CPU1 > kexec: Waking offline cpu 2. > kvm: enabling virtualization on CPU2 > kexec: Waking offline cpu 3. > kvm: enabling virtualization on CPU3 > kexec: Waking offline cpu 5. > kvm: enabling virtualization on CPU5 > [...] > kvm: enabling virtualization on CPU63 > kexec: waiting for cpu 1 (physical 1) to enter OPAL > kexec: waiting for cpu 2 (physical 2) to enter OPAL > kexec: waiting for cpu 3 (physical 3) to enter OPAL > > I'm running benh's next branch as of thismorning, and SMT was off. > Oh! This looks like a different hang than the one I tried to fix. My patch ("powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode") which is already in benh's next branch was aimed at fixing the "CPU is stuck" issue which was observed during the second kernel boot. If the first kernel itself is hanging in the down-path, then it looks like a different problem altogether. > Could you please post your latest patches a series? I will test them here. > The 4 patches that I proposed in this thread are aimed at making the above solution more elegant, by not having to actually online the secondary threads while doing kexec. I don't think it will solve the hang that you are seeing. In any case, I'll provide the consolidated patch below if you want to give it a try. By the way, I have a few questions regarding the hang you observed: is it always reproducible with SMT=off? And if SMT was 8 (i.e, all CPUs in the system were online) and then you did a kexec, do you still see the hang? Regards, Srivatsa S. Bhat --- diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index 16d7e33..2a31b52 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -68,6 +68,7 @@ static inline void crash_setup_regs(struct pt_regs *newregs, ppc_save_regs(newregs); } +extern bool kexec_cpu_wake(void); extern void kexec_smp_wait(void); /* get and clear naca physid, wait for master to copy new code to 0 */ extern int crashing_cpu; diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index f92b0b5..39f721d 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -255,6 +255,16 @@ struct machdep_calls { void (*machine_shutdown)(void); #ifdef CONFIG_KEXEC +#if (defined CONFIG_PPC64) && (defined CONFIG_PPC_BOOK3S) + + /* +* The pseries and powernv book3s platforms have a special requirement +* that soft-offline CPUs have to be woken up before kexec, to avoid +* CPUs getting stuck. This callback prepares the system for the +* impending wakeup of the offline CPUs. +*/ + void (*kexec_wake_prepare)(void); +#endif void (*kexec_cpu_down)(int crash_shutdown, int secondary); /* Called to do what every setup is needed on image and the diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c index 879b3aa..84e91293 100644 --- a/arch/powerpc/kernel/machine_kexec_64.c +++ b/arch/powerpc/kernel/machine_kexec_64.c @@ -182,6 +182,14 @@ static void kexec_smp_down(void *arg) /* NOTREACHED */ } +bool kexec_cpu_wake(void) +{ + kexec_smp_down(NULL); + + /* NOTREACHED */ + return true; +} + static void kexec_prepare_cpus_wait(int wait_state) { int my_cpu, i, notified=-1; @@ -202,7 +210,7 @@ static void kexec_prepare_cpus_wait(int wait_state) * these possible-but-not-online-but-should-be CPUs and chaperone them * into kexec_smp_wait(). */ - for_each_online_cpu(i) { + for_each_present_cpu(i) { if (i == my_cpu) continue; @@ -228,16 +236,22 @@ static void kexec_prepare_cpus_wait(int wait_state) * threads as offline -- and again, these CPUs will be stuck. * * So, we online all CPUs that should be running, including secondary threads. + * + * TODO: Update this comment */ static void wake_offline_cpus(void) { int cpu = 0; + if (ppc_m
Re: [PATCH] powerpc, kexec: Fix Processor X is stuck issue during kexec from ST mode
Hi Joel, On 06/12/2014 12:09 PM, Joel Stanley wrote: Hi Srivatsa, On Sat, Jun 7, 2014 at 7:16 AM, Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com wrote: And with the following hunk added (which I had forgotten earlier), it worked just fine on powernv :-) How are the patches coming along? I'm still waiting to test this patch series on a PowerVM box, and unfortunately there are some machine issues to debug first :-( So that's why this is taking time... :-( I just hung a machine here while attempting to kexec. It appears to have onlined all of the secondary threads, and then hung here: kexec: Waking offline cpu 1. kvm: enabling virtualization on CPU1 kexec: Waking offline cpu 2. kvm: enabling virtualization on CPU2 kexec: Waking offline cpu 3. kvm: enabling virtualization on CPU3 kexec: Waking offline cpu 5. kvm: enabling virtualization on CPU5 [...] kvm: enabling virtualization on CPU63 kexec: waiting for cpu 1 (physical 1) to enter OPAL kexec: waiting for cpu 2 (physical 2) to enter OPAL kexec: waiting for cpu 3 (physical 3) to enter OPAL I'm running benh's next branch as of thismorning, and SMT was off. Oh! This looks like a different hang than the one I tried to fix. My patch (powerpc, kexec: Fix Processor X is stuck issue during kexec from ST mode) which is already in benh's next branch was aimed at fixing the CPU is stuck issue which was observed during the second kernel boot. If the first kernel itself is hanging in the down-path, then it looks like a different problem altogether. Could you please post your latest patches a series? I will test them here. The 4 patches that I proposed in this thread are aimed at making the above solution more elegant, by not having to actually online the secondary threads while doing kexec. I don't think it will solve the hang that you are seeing. In any case, I'll provide the consolidated patch below if you want to give it a try. By the way, I have a few questions regarding the hang you observed: is it always reproducible with SMT=off? And if SMT was 8 (i.e, all CPUs in the system were online) and then you did a kexec, do you still see the hang? Regards, Srivatsa S. Bhat --- diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index 16d7e33..2a31b52 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -68,6 +68,7 @@ static inline void crash_setup_regs(struct pt_regs *newregs, ppc_save_regs(newregs); } +extern bool kexec_cpu_wake(void); extern void kexec_smp_wait(void); /* get and clear naca physid, wait for master to copy new code to 0 */ extern int crashing_cpu; diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index f92b0b5..39f721d 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -255,6 +255,16 @@ struct machdep_calls { void (*machine_shutdown)(void); #ifdef CONFIG_KEXEC +#if (defined CONFIG_PPC64) (defined CONFIG_PPC_BOOK3S) + + /* +* The pseries and powernv book3s platforms have a special requirement +* that soft-offline CPUs have to be woken up before kexec, to avoid +* CPUs getting stuck. This callback prepares the system for the +* impending wakeup of the offline CPUs. +*/ + void (*kexec_wake_prepare)(void); +#endif void (*kexec_cpu_down)(int crash_shutdown, int secondary); /* Called to do what every setup is needed on image and the diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c index 879b3aa..84e91293 100644 --- a/arch/powerpc/kernel/machine_kexec_64.c +++ b/arch/powerpc/kernel/machine_kexec_64.c @@ -182,6 +182,14 @@ static void kexec_smp_down(void *arg) /* NOTREACHED */ } +bool kexec_cpu_wake(void) +{ + kexec_smp_down(NULL); + + /* NOTREACHED */ + return true; +} + static void kexec_prepare_cpus_wait(int wait_state) { int my_cpu, i, notified=-1; @@ -202,7 +210,7 @@ static void kexec_prepare_cpus_wait(int wait_state) * these possible-but-not-online-but-should-be CPUs and chaperone them * into kexec_smp_wait(). */ - for_each_online_cpu(i) { + for_each_present_cpu(i) { if (i == my_cpu) continue; @@ -228,16 +236,22 @@ static void kexec_prepare_cpus_wait(int wait_state) * threads as offline -- and again, these CPUs will be stuck. * * So, we online all CPUs that should be running, including secondary threads. + * + * TODO: Update this comment */ static void wake_offline_cpus(void) { int cpu = 0; + if (ppc_md.kexec_wake_prepare) + ppc_md.kexec_wake_prepare(); + for_each_present_cpu(cpu) { if (!cpu_online(cpu
[PATCH v2] CPU hotplug, smp: Execute any pending IPI callbacks before CPU offline
ak any existing code; hence lets go with the solution proposed above until that is done). [a...@linux-foundation.org: coding-style fixes] Signed-off-by: Srivatsa S. Bhat Suggested-by: Frederic Weisbecker Cc: "Paul E. McKenney" Cc: Borislav Petkov Cc: Christoph Hellwig Cc: Frederic Weisbecker Cc: Gautham R Shenoy Cc: Ingo Molnar Cc: Mel Gorman Cc: Mike Galbraith Cc: Oleg Nesterov Cc: Peter Zijlstra Cc: Rafael J. Wysocki Cc: Rik van Riel Cc: Rusty Russell Cc: Srivatsa S. Bhat Cc: Steven Rostedt Cc: Tejun Heo Cc: Thomas Gleixner --- Changes in v2: * Modified the changelog to make it more accurate and easy to understand, based on feedback by Peter Zijlstra. * Replaced the term "IPI functions" with "IPI callbacks" in the code comments. * Absolutely no changes in the code. kernel/smp.c | 56 1 file changed, 48 insertions(+), 8 deletions(-) diff --git a/kernel/smp.c b/kernel/smp.c index 306f818..ef3941d 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -29,6 +29,8 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct call_function_data, cfd_data); static DEFINE_PER_CPU_SHARED_ALIGNED(struct llist_head, call_single_queue); +static void flush_smp_call_function_queue(bool warn_cpu_offline); + static int hotplug_cfd(struct notifier_block *nfb, unsigned long action, void *hcpu) { @@ -52,6 +54,20 @@ hotplug_cfd(struct notifier_block *nfb, unsigned long action, void *hcpu) case CPU_UP_CANCELED: case CPU_UP_CANCELED_FROZEN: + case CPU_DYING: + case CPU_DYING_FROZEN: + /* +* The IPIs for the smp-call-function callbacks queued by other +* CPUs might arrive late, either due to hardware latencies or +* because this CPU disabled interrupts (inside stop-machine) +* before the IPIs were sent. So flush out any pending callbacks +* explicitly (without waiting for the IPIs to arrive), to +* ensure that the outgoing CPU doesn't go offline with work +* still pending. +*/ + flush_smp_call_function_queue(false); + break; + case CPU_DEAD: case CPU_DEAD_FROZEN: free_cpumask_var(cfd->cpumask); @@ -177,23 +193,47 @@ static int generic_exec_single(int cpu, struct call_single_data *csd, return 0; } -/* - * Invoked by arch to handle an IPI for call function single. Must be - * called from the arch with interrupts disabled. +/** + * generic_smp_call_function_single_interrupt - Execute SMP IPI callbacks + * + * Invoked by arch to handle an IPI for call function single. + * Must be called with interrupts disabled. */ void generic_smp_call_function_single_interrupt(void) { + flush_smp_call_function_queue(true); +} + +/** + * flush_smp_call_function_queue - Flush pending smp-call-function callbacks + * + * @warn_cpu_offline: If set to 'true', warn if callbacks were queued on an + * offline CPU. Skip this check if set to 'false'. + * + * Flush any pending smp-call-function callbacks queued on this CPU. This is + * invoked by the generic IPI handler, as well as by a CPU about to go offline, + * to ensure that all pending IPI callbacks are run before it goes completely + * offline. + * + * Loop through the call_single_queue and run all the queued callbacks. + * Must be called with interrupts disabled. + */ +static void flush_smp_call_function_queue(bool warn_cpu_offline) +{ + struct llist_head *head; struct llist_node *entry; struct call_single_data *csd, *csd_next; static bool warned; - entry = llist_del_all(&__get_cpu_var(call_single_queue)); + WARN_ON(!irqs_disabled()); + + head = &__get_cpu_var(call_single_queue); + entry = llist_del_all(head); entry = llist_reverse_order(entry); - /* -* Shouldn't receive this interrupt on a cpu that is not yet online. -*/ - if (unlikely(!cpu_online(smp_processor_id()) && !warned)) { + /* There shouldn't be any pending callbacks on an offline CPU. */ + if (unlikely(warn_cpu_offline && !cpu_online(smp_processor_id()) && +!warned && !llist_empty(head))) { warned = true; WARN(1, "IPI on offline CPU %d\n", smp_processor_id()); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] CPU hotplug, smp: Execute any pending IPI callbacks before CPU offline
; hence lets go with the solution proposed above until that is done). [a...@linux-foundation.org: coding-style fixes] Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com Suggested-by: Frederic Weisbecker fweis...@gmail.com Cc: Paul E. McKenney paul...@linux.vnet.ibm.com Cc: Borislav Petkov b...@suse.de Cc: Christoph Hellwig h...@infradead.org Cc: Frederic Weisbecker fweis...@gmail.com Cc: Gautham R Shenoy e...@linux.vnet.ibm.com Cc: Ingo Molnar mi...@kernel.org Cc: Mel Gorman mgor...@suse.de Cc: Mike Galbraith mgalbra...@suse.de Cc: Oleg Nesterov o...@redhat.com Cc: Peter Zijlstra pet...@infradead.org Cc: Rafael J. Wysocki r...@rjwysocki.net Cc: Rik van Riel r...@redhat.com Cc: Rusty Russell ru...@rustcorp.com.au Cc: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com Cc: Steven Rostedt rost...@goodmis.org Cc: Tejun Heo t...@kernel.org Cc: Thomas Gleixner t...@linutronix.de --- Changes in v2: * Modified the changelog to make it more accurate and easy to understand, based on feedback by Peter Zijlstra. * Replaced the term IPI functions with IPI callbacks in the code comments. * Absolutely no changes in the code. kernel/smp.c | 56 1 file changed, 48 insertions(+), 8 deletions(-) diff --git a/kernel/smp.c b/kernel/smp.c index 306f818..ef3941d 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -29,6 +29,8 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct call_function_data, cfd_data); static DEFINE_PER_CPU_SHARED_ALIGNED(struct llist_head, call_single_queue); +static void flush_smp_call_function_queue(bool warn_cpu_offline); + static int hotplug_cfd(struct notifier_block *nfb, unsigned long action, void *hcpu) { @@ -52,6 +54,20 @@ hotplug_cfd(struct notifier_block *nfb, unsigned long action, void *hcpu) case CPU_UP_CANCELED: case CPU_UP_CANCELED_FROZEN: + case CPU_DYING: + case CPU_DYING_FROZEN: + /* +* The IPIs for the smp-call-function callbacks queued by other +* CPUs might arrive late, either due to hardware latencies or +* because this CPU disabled interrupts (inside stop-machine) +* before the IPIs were sent. So flush out any pending callbacks +* explicitly (without waiting for the IPIs to arrive), to +* ensure that the outgoing CPU doesn't go offline with work +* still pending. +*/ + flush_smp_call_function_queue(false); + break; + case CPU_DEAD: case CPU_DEAD_FROZEN: free_cpumask_var(cfd-cpumask); @@ -177,23 +193,47 @@ static int generic_exec_single(int cpu, struct call_single_data *csd, return 0; } -/* - * Invoked by arch to handle an IPI for call function single. Must be - * called from the arch with interrupts disabled. +/** + * generic_smp_call_function_single_interrupt - Execute SMP IPI callbacks + * + * Invoked by arch to handle an IPI for call function single. + * Must be called with interrupts disabled. */ void generic_smp_call_function_single_interrupt(void) { + flush_smp_call_function_queue(true); +} + +/** + * flush_smp_call_function_queue - Flush pending smp-call-function callbacks + * + * @warn_cpu_offline: If set to 'true', warn if callbacks were queued on an + * offline CPU. Skip this check if set to 'false'. + * + * Flush any pending smp-call-function callbacks queued on this CPU. This is + * invoked by the generic IPI handler, as well as by a CPU about to go offline, + * to ensure that all pending IPI callbacks are run before it goes completely + * offline. + * + * Loop through the call_single_queue and run all the queued callbacks. + * Must be called with interrupts disabled. + */ +static void flush_smp_call_function_queue(bool warn_cpu_offline) +{ + struct llist_head *head; struct llist_node *entry; struct call_single_data *csd, *csd_next; static bool warned; - entry = llist_del_all(__get_cpu_var(call_single_queue)); + WARN_ON(!irqs_disabled()); + + head = __get_cpu_var(call_single_queue); + entry = llist_del_all(head); entry = llist_reverse_order(entry); - /* -* Shouldn't receive this interrupt on a cpu that is not yet online. -*/ - if (unlikely(!cpu_online(smp_processor_id()) !warned)) { + /* There shouldn't be any pending callbacks on an offline CPU. */ + if (unlikely(warn_cpu_offline !cpu_online(smp_processor_id()) +!warned !llist_empty(head))) { warned = true; WARN(1, IPI on offline CPU %d\n, smp_processor_id()); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/