Re: [RFC PATCH bpf-next v2 1/4] tracing/kprobe: bpf: Check error injectable event is on function entry
On Wed, 27 Dec 2017 19:45:42 -0800 Alexei Starovoitovwrote: > On 12/27/17 6:34 PM, Masami Hiramatsu wrote: > > On Wed, 27 Dec 2017 14:46:24 -0800 > > Alexei Starovoitov wrote: > > > >> On 12/26/17 9:56 PM, Masami Hiramatsu wrote: > >>> On Tue, 26 Dec 2017 17:57:32 -0800 > >>> Alexei Starovoitov wrote: > >>> > On Tue, Dec 26, 2017 at 04:46:59PM +0900, Masami Hiramatsu wrote: > > Check whether error injectable event is on function entry or not. > > Currently it checks the event is ftrace-based kprobes or not, > > but that is wrong. It should check if the event is on the entry > > of target function. Since error injection will override a function > > to just return with modified return value, that operation must > > be done before the target function starts making stackframe. > > > > As a side effect, bpf error injection is no need to depend on > > function-tracer. It can work with sw-breakpoint based kprobe > > events too. > > > > Signed-off-by: Masami Hiramatsu > > --- > > kernel/trace/Kconfig|2 -- > > kernel/trace/bpf_trace.c|6 +++--- > > kernel/trace/trace_kprobe.c |8 +--- > > kernel/trace/trace_probe.h | 12 ++-- > > 4 files changed, 14 insertions(+), 14 deletions(-) > > > > diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig > > index ae3a2d519e50..6400e1bf97c5 100644 > > --- a/kernel/trace/Kconfig > > +++ b/kernel/trace/Kconfig > > @@ -533,9 +533,7 @@ config FUNCTION_PROFILER > > config BPF_KPROBE_OVERRIDE > > bool "Enable BPF programs to override a kprobed function" > > depends on BPF_EVENTS > > - depends on KPROBES_ON_FTRACE > > depends on HAVE_KPROBE_OVERRIDE > > - depends on DYNAMIC_FTRACE_WITH_REGS > > default n > > help > > Allows BPF to override the execution of a probed function and > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > > index f6d2327ecb59..d663660f8392 100644 > > --- a/kernel/trace/bpf_trace.c > > +++ b/kernel/trace/bpf_trace.c > > @@ -800,11 +800,11 @@ int perf_event_attach_bpf_prog(struct perf_event > > *event, > > int ret = -EEXIST; > > > > /* > > -* Kprobe override only works for ftrace based kprobes, and > > only if they > > -* are on the opt-in list. > > +* Kprobe override only works if they are on the function entry, > > +* and only if they are on the opt-in list. > > */ > > if (prog->kprobe_override && > > - (!trace_kprobe_ftrace(event->tp_event) || > > + (!trace_kprobe_on_func_entry(event->tp_event) || > > !trace_kprobe_error_injectable(event->tp_event))) > > return -EINVAL; > > > > diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c > > index 91f4b57dab82..265e3e27e8dc 100644 > > --- a/kernel/trace/trace_kprobe.c > > +++ b/kernel/trace/trace_kprobe.c > > @@ -88,13 +88,15 @@ static nokprobe_inline unsigned long > > trace_kprobe_nhit(struct trace_kprobe *tk) > > return nhit; > > } > > > > -int trace_kprobe_ftrace(struct trace_event_call *call) > > +bool trace_kprobe_on_func_entry(struct trace_event_call *call) > > { > > struct trace_kprobe *tk = (struct trace_kprobe *)call->data; > > - return kprobe_ftrace(>rp.kp); > > + > > + return kprobe_on_func_entry(tk->rp.kp.addr, > > tk->rp.kp.symbol_name, > > + tk->rp.kp.offset); > > That would be nice, but did you test this? > >>> > >>> Yes, because the jprobe, which was only official user of modifying > >>> execution > >>> path using kprobe, did same way to check. (and kretprobe also does it) > >>> > My understanding that kprobe will restore all regs and > here we need to override return ip _and_ value. > >>> > >>> yes, no problem. kprobe restore all regs from pt_regs, including regs->ip. > >>> > Could you add a patch with the test the way Josef did > or describe the steps to test this new mode? > >>> > >>> Would you mean below patch? If so, it should work without any change. > >>> > >>> [PATCH v10 4/5] samples/bpf: add a test for bpf_override_return > >> > >> yeah. I expect bpf_override_return test to work as-is. > >> I'm asking for the test for new functionality added by this patch. > >> In particular kprobe on func entry without ftrace. > >> How did you test it? > > > > This function is used in kretprobe and jprobe. Jprobe was the user of > > "modifying instruction pointer to another function" in kprobes. > > If it doesn't work, jprobe also doesn't work, this means you can not > > modify IP by
Re: [RFC PATCH bpf-next v2 1/4] tracing/kprobe: bpf: Check error injectable event is on function entry
On Wed, 27 Dec 2017 19:45:42 -0800 Alexei Starovoitov wrote: > On 12/27/17 6:34 PM, Masami Hiramatsu wrote: > > On Wed, 27 Dec 2017 14:46:24 -0800 > > Alexei Starovoitov wrote: > > > >> On 12/26/17 9:56 PM, Masami Hiramatsu wrote: > >>> On Tue, 26 Dec 2017 17:57:32 -0800 > >>> Alexei Starovoitov wrote: > >>> > On Tue, Dec 26, 2017 at 04:46:59PM +0900, Masami Hiramatsu wrote: > > Check whether error injectable event is on function entry or not. > > Currently it checks the event is ftrace-based kprobes or not, > > but that is wrong. It should check if the event is on the entry > > of target function. Since error injection will override a function > > to just return with modified return value, that operation must > > be done before the target function starts making stackframe. > > > > As a side effect, bpf error injection is no need to depend on > > function-tracer. It can work with sw-breakpoint based kprobe > > events too. > > > > Signed-off-by: Masami Hiramatsu > > --- > > kernel/trace/Kconfig|2 -- > > kernel/trace/bpf_trace.c|6 +++--- > > kernel/trace/trace_kprobe.c |8 +--- > > kernel/trace/trace_probe.h | 12 ++-- > > 4 files changed, 14 insertions(+), 14 deletions(-) > > > > diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig > > index ae3a2d519e50..6400e1bf97c5 100644 > > --- a/kernel/trace/Kconfig > > +++ b/kernel/trace/Kconfig > > @@ -533,9 +533,7 @@ config FUNCTION_PROFILER > > config BPF_KPROBE_OVERRIDE > > bool "Enable BPF programs to override a kprobed function" > > depends on BPF_EVENTS > > - depends on KPROBES_ON_FTRACE > > depends on HAVE_KPROBE_OVERRIDE > > - depends on DYNAMIC_FTRACE_WITH_REGS > > default n > > help > > Allows BPF to override the execution of a probed function and > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > > index f6d2327ecb59..d663660f8392 100644 > > --- a/kernel/trace/bpf_trace.c > > +++ b/kernel/trace/bpf_trace.c > > @@ -800,11 +800,11 @@ int perf_event_attach_bpf_prog(struct perf_event > > *event, > > int ret = -EEXIST; > > > > /* > > -* Kprobe override only works for ftrace based kprobes, and > > only if they > > -* are on the opt-in list. > > +* Kprobe override only works if they are on the function entry, > > +* and only if they are on the opt-in list. > > */ > > if (prog->kprobe_override && > > - (!trace_kprobe_ftrace(event->tp_event) || > > + (!trace_kprobe_on_func_entry(event->tp_event) || > > !trace_kprobe_error_injectable(event->tp_event))) > > return -EINVAL; > > > > diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c > > index 91f4b57dab82..265e3e27e8dc 100644 > > --- a/kernel/trace/trace_kprobe.c > > +++ b/kernel/trace/trace_kprobe.c > > @@ -88,13 +88,15 @@ static nokprobe_inline unsigned long > > trace_kprobe_nhit(struct trace_kprobe *tk) > > return nhit; > > } > > > > -int trace_kprobe_ftrace(struct trace_event_call *call) > > +bool trace_kprobe_on_func_entry(struct trace_event_call *call) > > { > > struct trace_kprobe *tk = (struct trace_kprobe *)call->data; > > - return kprobe_ftrace(>rp.kp); > > + > > + return kprobe_on_func_entry(tk->rp.kp.addr, > > tk->rp.kp.symbol_name, > > + tk->rp.kp.offset); > > That would be nice, but did you test this? > >>> > >>> Yes, because the jprobe, which was only official user of modifying > >>> execution > >>> path using kprobe, did same way to check. (and kretprobe also does it) > >>> > My understanding that kprobe will restore all regs and > here we need to override return ip _and_ value. > >>> > >>> yes, no problem. kprobe restore all regs from pt_regs, including regs->ip. > >>> > Could you add a patch with the test the way Josef did > or describe the steps to test this new mode? > >>> > >>> Would you mean below patch? If so, it should work without any change. > >>> > >>> [PATCH v10 4/5] samples/bpf: add a test for bpf_override_return > >> > >> yeah. I expect bpf_override_return test to work as-is. > >> I'm asking for the test for new functionality added by this patch. > >> In particular kprobe on func entry without ftrace. > >> How did you test it? > > > > This function is used in kretprobe and jprobe. Jprobe was the user of > > "modifying instruction pointer to another function" in kprobes. > > If it doesn't work, jprobe also doesn't work, this means you can not > > modify IP by kprobes anymore. > > Anyway, until linux-4.13, that was well tested by kprobe smoke
Re: WARNING in strp_data_ready
On Thu, Dec 28, 2017 at 2:19 AM, Tom Herbertwrote: > On Wed, Dec 27, 2017 at 12:20 PM, Ozgur wrote: >> >> >> 27.12.2017, 23:14, "Dmitry Vyukov" : >>> On Wed, Dec 27, 2017 at 9:08 PM, Ozgur wrote: 27.12.2017, 22:21, "Dmitry Vyukov" : > On Wed, Dec 27, 2017 at 8:09 PM, Tom Herbert > wrote: >> Did you try the patch I posted? > > Hi Tom, Hello Dmitry, > No. And I didn't know I need to. Why? > If you think the patch needs additional testing, you can ask syzbot to > test it. See > https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot > Otherwise proceed with committing it. Or what are we waiting for? > > Thanks I think we need to fixed patch for crash, in fact check to patch code and test solve the bug. How do test it because there is no patch in the following bug? >>> >>> Hi Ozgur, >>> >>> I am not sure I completely understand what you mean. But the >>> reproducer for this bug (which one can use for testing) is here: >>> https://groups.google.com/forum/#!topic/syzkaller-bugs/Kxs05ziCpgY >>> Tom also mentions there is some patch for this, but I don't know where >>> it is, it doesn't seem to be referenced from this thread. >> >> Hello Dmitry, >> >> Ah, I'm sorry I don't seen Tom mail and I don't have a patch not tested :) >> I think Tom send patch to only you and are you tested? >> >> kcmsock.c will change and strp_data_ready I think locked. >> >> Tom, please send a patch for me? I can test and inform you. >> > Hi Ozgur, > > I reposted the patches as RFC "kcm: Fix lockdep issue". Please test if you > can! OK, I will work as your typist this time: #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master But I wonder what part of the following you don't understand? Do we need to improve wording or something? > If you think the patch needs additional testing, you can ask syzbot to test > it. > See > https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot Also I don't know what git repo/branch you have in mind. Kernel patches don't generally apply just to any tree. Fingers crossed that I guessed correctly and it will apply. diff --git a/include/net/sock.h b/include/net/sock.h index 9155da422692..7a7b14e9628a 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1514,6 +1514,11 @@ static inline bool sock_owned_by_user(const struct sock *sk) return sk->sk_lock.owned; } +static inline bool sock_owned_by_user_nocheck(const struct sock *sk) +{ + return sk->sk_lock.owned; +} + /* no reclassification while locks are held */ static inline bool sock_allow_reclassification(const struct sock *csk) { diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c index c5fda15ba319..1fdab5c4eda8 100644 --- a/net/strparser/strparser.c +++ b/net/strparser/strparser.c @@ -401,7 +401,7 @@ void strp_data_ready(struct strparser *strp) * allows a thread in BH context to safely check if the process * lock is held. In this case, if the lock is held, queue work. */ - if (sock_owned_by_user(strp->sk)) { + if (sock_owned_by_user_nocheck(strp->sk)) { queue_work(strp_wq, >work); return; }
Re: WARNING in strp_data_ready
On Thu, Dec 28, 2017 at 2:19 AM, Tom Herbert wrote: > On Wed, Dec 27, 2017 at 12:20 PM, Ozgur wrote: >> >> >> 27.12.2017, 23:14, "Dmitry Vyukov" : >>> On Wed, Dec 27, 2017 at 9:08 PM, Ozgur wrote: 27.12.2017, 22:21, "Dmitry Vyukov" : > On Wed, Dec 27, 2017 at 8:09 PM, Tom Herbert > wrote: >> Did you try the patch I posted? > > Hi Tom, Hello Dmitry, > No. And I didn't know I need to. Why? > If you think the patch needs additional testing, you can ask syzbot to > test it. See > https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot > Otherwise proceed with committing it. Or what are we waiting for? > > Thanks I think we need to fixed patch for crash, in fact check to patch code and test solve the bug. How do test it because there is no patch in the following bug? >>> >>> Hi Ozgur, >>> >>> I am not sure I completely understand what you mean. But the >>> reproducer for this bug (which one can use for testing) is here: >>> https://groups.google.com/forum/#!topic/syzkaller-bugs/Kxs05ziCpgY >>> Tom also mentions there is some patch for this, but I don't know where >>> it is, it doesn't seem to be referenced from this thread. >> >> Hello Dmitry, >> >> Ah, I'm sorry I don't seen Tom mail and I don't have a patch not tested :) >> I think Tom send patch to only you and are you tested? >> >> kcmsock.c will change and strp_data_ready I think locked. >> >> Tom, please send a patch for me? I can test and inform you. >> > Hi Ozgur, > > I reposted the patches as RFC "kcm: Fix lockdep issue". Please test if you > can! OK, I will work as your typist this time: #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master But I wonder what part of the following you don't understand? Do we need to improve wording or something? > If you think the patch needs additional testing, you can ask syzbot to test > it. > See > https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot Also I don't know what git repo/branch you have in mind. Kernel patches don't generally apply just to any tree. Fingers crossed that I guessed correctly and it will apply. diff --git a/include/net/sock.h b/include/net/sock.h index 9155da422692..7a7b14e9628a 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1514,6 +1514,11 @@ static inline bool sock_owned_by_user(const struct sock *sk) return sk->sk_lock.owned; } +static inline bool sock_owned_by_user_nocheck(const struct sock *sk) +{ + return sk->sk_lock.owned; +} + /* no reclassification while locks are held */ static inline bool sock_allow_reclassification(const struct sock *csk) { diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c index c5fda15ba319..1fdab5c4eda8 100644 --- a/net/strparser/strparser.c +++ b/net/strparser/strparser.c @@ -401,7 +401,7 @@ void strp_data_ready(struct strparser *strp) * allows a thread in BH context to safely check if the process * lock is held. In this case, if the lock is held, queue work. */ - if (sock_owned_by_user(strp->sk)) { + if (sock_owned_by_user_nocheck(strp->sk)) { queue_work(strp_wq, >work); return; }
Re: [RFC PATCH bpf-next v2 4/4] error-injection: Support fault injection framework
On Wed, 27 Dec 2017 19:49:28 -0800 Alexei Starovoitovwrote: > On 12/27/17 5:38 PM, Masami Hiramatsu wrote: > > On Wed, 27 Dec 2017 14:49:46 -0800 > > Alexei Starovoitov wrote: > > > >> On 12/27/17 12:09 AM, Masami Hiramatsu wrote: > >>> On Tue, 26 Dec 2017 18:12:56 -0800 > >>> Alexei Starovoitov wrote: > >>> > On Tue, Dec 26, 2017 at 04:48:25PM +0900, Masami Hiramatsu wrote: > > Support in-kernel fault-injection framework via debugfs. > > This allows you to inject a conditional error to specified > > function using debugfs interfaces. > > > > Signed-off-by: Masami Hiramatsu > > --- > > Documentation/fault-injection/fault-injection.txt |5 + > > kernel/Makefile |1 > > kernel/fail_function.c| 169 > > + > > lib/Kconfig.debug | 10 + > > 4 files changed, 185 insertions(+) > > create mode 100644 kernel/fail_function.c > > > > diff --git a/Documentation/fault-injection/fault-injection.txt > > b/Documentation/fault-injection/fault-injection.txt > > index 918972babcd8..6243a588dd71 100644 > > --- a/Documentation/fault-injection/fault-injection.txt > > +++ b/Documentation/fault-injection/fault-injection.txt > > @@ -30,6 +30,11 @@ o fail_mmc_request > >injects MMC data errors on devices permitted by setting > >debugfs entries under /sys/kernel/debug/mmc0/fail_mmc_request > > > > +o fail_function > > + > > + injects error return on specific functions by setting debugfs entries > > + under /sys/kernel/debug/fail_function. No boot option supported. > > I like it. > Could you document it a bit better? > >>> > >>> Yes, I will do in next series. > >>> > In particular retval is configurable, but without an example no one > will be able to figure out how to use it. > >>> > >>> Ah, right. BTW, as I pointed in the covermail, should we store the > >>> expected error value range into the injectable list? e.g. > >>> > >>> ALLOW_ERROR_INJECTION(open_ctree, -1, -MAX_ERRNO) > >>> > >>> And provide APIs to check/get it. > >> > >> I'm afraid such check would be too costly. > >> Right now we have only two functions marked but I expect hundreds more > >> will be added in the near future as soon as developers realize the > >> potential of such error injection. > >> All of ALLOW_ERROR_INJECTION marks add 8 byte overhead each to .data. > >> Multiple by 1k and we have 8k of data spent on marks. > >> If we add max/min range marks that doubles it for very little use. > >> I think marking function only is enough. > > > > Sorry, I don't think so. > > Even if it takes 16 bytes more for each points, I don't think it is > > any overhead for machines in these days. Even if so, we can provide > > a kconfig to reduce it. > > I mean, we are living in GB-order memory are, and it will be bigger > > in the future. Why we have to worry about hundreds of 16bytes memory > > pieces? It will take a few KB, and even if we mark thousands of > > functions, it never reaches 1MB, in GB memory pool. :) > > > > Of course, for many small-footprint embedded devices (like having > > less than 128MB memory), this feature can be a overhead. But they > > can cut off the table by kconfig. > > I still disagree on wasting 16-byte * num_of_funcs of .data here. > The trade-off of usability vs memory just not worth it. Sorry. > Please focus on testing your changes instead. Then what happen if the user set invalid retval to those functions? even if we limit the injectable functions, it can cause a problem, for example, obj = func_return_object(); if (!obj) { handling_error...; } obj->field = x; In this case, obviously func_return_object() must return NULL if there is an error, not -ENOMEM. But without the correct retval information, how would you check the BPF code doesn't cause a trouble? Currently it seems you are expecting only the functions which return error code. ret = func_return_state(); if (ret < 0) { handling_error...; } But how we can distinguish those? If we have the error range for each function, we can ensure what is *correct* error code, NULL or errno, or any other error numbers. :) At least fail_function needs this feature because it can check return value when setting it up. Thank you, -- Masami Hiramatsu
Re: [RFC PATCH bpf-next v2 4/4] error-injection: Support fault injection framework
On Wed, 27 Dec 2017 19:49:28 -0800 Alexei Starovoitov wrote: > On 12/27/17 5:38 PM, Masami Hiramatsu wrote: > > On Wed, 27 Dec 2017 14:49:46 -0800 > > Alexei Starovoitov wrote: > > > >> On 12/27/17 12:09 AM, Masami Hiramatsu wrote: > >>> On Tue, 26 Dec 2017 18:12:56 -0800 > >>> Alexei Starovoitov wrote: > >>> > On Tue, Dec 26, 2017 at 04:48:25PM +0900, Masami Hiramatsu wrote: > > Support in-kernel fault-injection framework via debugfs. > > This allows you to inject a conditional error to specified > > function using debugfs interfaces. > > > > Signed-off-by: Masami Hiramatsu > > --- > > Documentation/fault-injection/fault-injection.txt |5 + > > kernel/Makefile |1 > > kernel/fail_function.c| 169 > > + > > lib/Kconfig.debug | 10 + > > 4 files changed, 185 insertions(+) > > create mode 100644 kernel/fail_function.c > > > > diff --git a/Documentation/fault-injection/fault-injection.txt > > b/Documentation/fault-injection/fault-injection.txt > > index 918972babcd8..6243a588dd71 100644 > > --- a/Documentation/fault-injection/fault-injection.txt > > +++ b/Documentation/fault-injection/fault-injection.txt > > @@ -30,6 +30,11 @@ o fail_mmc_request > >injects MMC data errors on devices permitted by setting > >debugfs entries under /sys/kernel/debug/mmc0/fail_mmc_request > > > > +o fail_function > > + > > + injects error return on specific functions by setting debugfs entries > > + under /sys/kernel/debug/fail_function. No boot option supported. > > I like it. > Could you document it a bit better? > >>> > >>> Yes, I will do in next series. > >>> > In particular retval is configurable, but without an example no one > will be able to figure out how to use it. > >>> > >>> Ah, right. BTW, as I pointed in the covermail, should we store the > >>> expected error value range into the injectable list? e.g. > >>> > >>> ALLOW_ERROR_INJECTION(open_ctree, -1, -MAX_ERRNO) > >>> > >>> And provide APIs to check/get it. > >> > >> I'm afraid such check would be too costly. > >> Right now we have only two functions marked but I expect hundreds more > >> will be added in the near future as soon as developers realize the > >> potential of such error injection. > >> All of ALLOW_ERROR_INJECTION marks add 8 byte overhead each to .data. > >> Multiple by 1k and we have 8k of data spent on marks. > >> If we add max/min range marks that doubles it for very little use. > >> I think marking function only is enough. > > > > Sorry, I don't think so. > > Even if it takes 16 bytes more for each points, I don't think it is > > any overhead for machines in these days. Even if so, we can provide > > a kconfig to reduce it. > > I mean, we are living in GB-order memory are, and it will be bigger > > in the future. Why we have to worry about hundreds of 16bytes memory > > pieces? It will take a few KB, and even if we mark thousands of > > functions, it never reaches 1MB, in GB memory pool. :) > > > > Of course, for many small-footprint embedded devices (like having > > less than 128MB memory), this feature can be a overhead. But they > > can cut off the table by kconfig. > > I still disagree on wasting 16-byte * num_of_funcs of .data here. > The trade-off of usability vs memory just not worth it. Sorry. > Please focus on testing your changes instead. Then what happen if the user set invalid retval to those functions? even if we limit the injectable functions, it can cause a problem, for example, obj = func_return_object(); if (!obj) { handling_error...; } obj->field = x; In this case, obviously func_return_object() must return NULL if there is an error, not -ENOMEM. But without the correct retval information, how would you check the BPF code doesn't cause a trouble? Currently it seems you are expecting only the functions which return error code. ret = func_return_state(); if (ret < 0) { handling_error...; } But how we can distinguish those? If we have the error range for each function, we can ensure what is *correct* error code, NULL or errno, or any other error numbers. :) At least fail_function needs this feature because it can check return value when setting it up. Thank you, -- Masami Hiramatsu
[PATCH v2] ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGE
If we can't get inode lock immediately in the function ocfs2_inode_lock_with_page() when reading a page, we should not return directly here, since this will lead to a softlockup problem when the kernel is configured with CONFIG_PREEMPT is not set. The method is to get a blocking lock and immediately unlock before returning, this can avoid CPU resource waste due to lots of retries, and benefits fairness in getting lock among multiple nodes, increase efficiency in case modifying the same file frequently from multiple nodes. The softlockup crash (when set /proc/sys/kernel/softlockup_panic to 1) looks like, Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 885 Comm: multi_mmap Tainted: G L 4.12.14-6.1-default #1 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Call Trace: dump_stack+0x5c/0x82 panic+0xd5/0x21e watchdog_timer_fn+0x208/0x210 ? watchdog_park_threads+0x70/0x70 __hrtimer_run_queues+0xcc/0x200 hrtimer_interrupt+0xa6/0x1f0 smp_apic_timer_interrupt+0x34/0x50 apic_timer_interrupt+0x96/0xa0 RIP: 0010:unlock_page+0x17/0x30 RSP: :af154080bc88 EFLAGS: 0246 ORIG_RAX: ff10 RAX: dead0100 RBX: f21e009f5300 RCX: 0004 RDX: dead00ff RSI: 0202 RDI: f21e009f5300 RBP: R08: R09: af154080bb00 R10: af154080bc30 R11: 0040 R12: 993749a39518 R13: R14: f21e009f5300 R15: f21e009f5300 ocfs2_inode_lock_with_page+0x25/0x30 [ocfs2] ocfs2_readpage+0x41/0x2d0 [ocfs2] ? pagecache_get_page+0x30/0x200 filemap_fault+0x12b/0x5c0 ? recalc_sigpending+0x17/0x50 ? __set_task_blocked+0x28/0x70 ? __set_current_blocked+0x3d/0x60 ocfs2_fault+0x29/0xb0 [ocfs2] __do_fault+0x1a/0xa0 __handle_mm_fault+0xbe8/0x1090 handle_mm_fault+0xaa/0x1f0 __do_page_fault+0x235/0x4b0 trace_do_page_fault+0x3c/0x110 async_page_fault+0x28/0x30 RIP: 0033:0x7fa75ded638e RSP: 002b:7ffd6657db18 EFLAGS: 00010287 RAX: 55c7662fb700 RBX: 0001 RCX: 55c7662fb700 RDX: 1770 RSI: 7fa75e909000 RDI: 55c7662fb700 RBP: 0003 R08: 000e R09: R10: 0483 R11: 7fa75ded61b0 R12: 7fa75e90a770 R13: 000e R14: 1770 R15: About performance improvement, we can see the testing time is reduced, and CPU utilization decreases, the detailed data is as follows. I ran multi_mmap test case in ocfs2-test package in a three nodes cluster. Before apply this patch, PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 2754 ocfs2te+ 20 0 170248 6980 4856 D 80.73 0.341 0:18.71 multi_mmap 1505 root rt 0 36 123060 97224 S 2.658 6.015 0:01.44 corosync 5 root 20 0 0 0 0 S 1.329 0.000 0:00.19 kworker/u8:0 95 root 20 0 0 0 0 S 1.329 0.000 0:00.25 kworker/u8:1 2728 root 20 0 0 0 0 S 0.997 0.000 0:00.24 jbd2/sda1-33 2721 root 20 0 0 0 0 S 0.664 0.000 0:00.07 ocfs2dc-3C8CFD4 2750 ocfs2te+ 20 0 142976 4652 3532 S 0.664 0.227 0:00.28 mpirun ocfs2test@tb-node2:~>multiple_run.sh -i ens3 -k ~/linux-4.4.21-69.tar.gz -o ~/ocfs2mullog -C hacluster -s pcmk -n tb-node2,tb-node1,tb-node3 -d /dev/sda1 -b 4096 -c 32768 -t multi_mmap /mnt/shared Tests with "-b 4096 -C 32768" Thu Dec 28 14:44:52 CST 2017 multi_mmap..Passed. Runtime 783 seconds. After apply this patch, PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 2508 ocfs2te+ 20 0 170248 6804 4680 R 54.00 0.333 0:55.37 multi_mmap 155 root 20 0 0 0 0 S 2.667 0.000 0:01.20 kworker/u8:3 95 root 20 0 0 0 0 S 2.000 0.000 0:01.58 kworker/u8:1 2504 ocfs2te+ 20 0 142976 4604 3480 R 1.667 0.225 0:01.65 mpirun 5 root 20 0 0 0 0 S 1.000 0.000 0:01.36 kworker/u8:0 2482 root 20 0 0 0 0 S 1.000 0.000 0:00.86 jbd2/sda1-33 299 root 0 -20 0 0 0 S 0.333 0.000 0:00.13 kworker/2:1H 335 root 0 -20 0 0 0 S 0.333 0.000 0:00.17 kworker/1:1H 535 root 20 0 12140 7268 1456 S 0.333 0.355 0:00.34 haveged 1282 root rt 0 84 123108 97224 S 0.333 6.017 0:01.33 corosync ocfs2test@tb-node2:~>multiple_run.sh -i ens3 -k ~/linux-4.4.21-69.tar.gz -o ~/ocfs2mullog -C hacluster -s pcmk -n tb-node2,tb-node1,tb-node3 -d /dev/sda1 -b 4096 -c 32768 -t multi_mmap /mnt/shared Tests with "-b 4096 -C 32768" Thu Dec 28 15:04:12 CST 2017 multi_mmap..Passed. Runtime 487 seconds. Fixes: 1cce4df04f37 ("ocfs2: do not lock/unlock() inode DLM lock") Signed-off-by: Gang He--- fs/ocfs2/dlmglue.c | 9 + 1 file changed, 9
[PATCH v2] ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGE
If we can't get inode lock immediately in the function ocfs2_inode_lock_with_page() when reading a page, we should not return directly here, since this will lead to a softlockup problem when the kernel is configured with CONFIG_PREEMPT is not set. The method is to get a blocking lock and immediately unlock before returning, this can avoid CPU resource waste due to lots of retries, and benefits fairness in getting lock among multiple nodes, increase efficiency in case modifying the same file frequently from multiple nodes. The softlockup crash (when set /proc/sys/kernel/softlockup_panic to 1) looks like, Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 885 Comm: multi_mmap Tainted: G L 4.12.14-6.1-default #1 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Call Trace: dump_stack+0x5c/0x82 panic+0xd5/0x21e watchdog_timer_fn+0x208/0x210 ? watchdog_park_threads+0x70/0x70 __hrtimer_run_queues+0xcc/0x200 hrtimer_interrupt+0xa6/0x1f0 smp_apic_timer_interrupt+0x34/0x50 apic_timer_interrupt+0x96/0xa0 RIP: 0010:unlock_page+0x17/0x30 RSP: :af154080bc88 EFLAGS: 0246 ORIG_RAX: ff10 RAX: dead0100 RBX: f21e009f5300 RCX: 0004 RDX: dead00ff RSI: 0202 RDI: f21e009f5300 RBP: R08: R09: af154080bb00 R10: af154080bc30 R11: 0040 R12: 993749a39518 R13: R14: f21e009f5300 R15: f21e009f5300 ocfs2_inode_lock_with_page+0x25/0x30 [ocfs2] ocfs2_readpage+0x41/0x2d0 [ocfs2] ? pagecache_get_page+0x30/0x200 filemap_fault+0x12b/0x5c0 ? recalc_sigpending+0x17/0x50 ? __set_task_blocked+0x28/0x70 ? __set_current_blocked+0x3d/0x60 ocfs2_fault+0x29/0xb0 [ocfs2] __do_fault+0x1a/0xa0 __handle_mm_fault+0xbe8/0x1090 handle_mm_fault+0xaa/0x1f0 __do_page_fault+0x235/0x4b0 trace_do_page_fault+0x3c/0x110 async_page_fault+0x28/0x30 RIP: 0033:0x7fa75ded638e RSP: 002b:7ffd6657db18 EFLAGS: 00010287 RAX: 55c7662fb700 RBX: 0001 RCX: 55c7662fb700 RDX: 1770 RSI: 7fa75e909000 RDI: 55c7662fb700 RBP: 0003 R08: 000e R09: R10: 0483 R11: 7fa75ded61b0 R12: 7fa75e90a770 R13: 000e R14: 1770 R15: About performance improvement, we can see the testing time is reduced, and CPU utilization decreases, the detailed data is as follows. I ran multi_mmap test case in ocfs2-test package in a three nodes cluster. Before apply this patch, PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 2754 ocfs2te+ 20 0 170248 6980 4856 D 80.73 0.341 0:18.71 multi_mmap 1505 root rt 0 36 123060 97224 S 2.658 6.015 0:01.44 corosync 5 root 20 0 0 0 0 S 1.329 0.000 0:00.19 kworker/u8:0 95 root 20 0 0 0 0 S 1.329 0.000 0:00.25 kworker/u8:1 2728 root 20 0 0 0 0 S 0.997 0.000 0:00.24 jbd2/sda1-33 2721 root 20 0 0 0 0 S 0.664 0.000 0:00.07 ocfs2dc-3C8CFD4 2750 ocfs2te+ 20 0 142976 4652 3532 S 0.664 0.227 0:00.28 mpirun ocfs2test@tb-node2:~>multiple_run.sh -i ens3 -k ~/linux-4.4.21-69.tar.gz -o ~/ocfs2mullog -C hacluster -s pcmk -n tb-node2,tb-node1,tb-node3 -d /dev/sda1 -b 4096 -c 32768 -t multi_mmap /mnt/shared Tests with "-b 4096 -C 32768" Thu Dec 28 14:44:52 CST 2017 multi_mmap..Passed. Runtime 783 seconds. After apply this patch, PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 2508 ocfs2te+ 20 0 170248 6804 4680 R 54.00 0.333 0:55.37 multi_mmap 155 root 20 0 0 0 0 S 2.667 0.000 0:01.20 kworker/u8:3 95 root 20 0 0 0 0 S 2.000 0.000 0:01.58 kworker/u8:1 2504 ocfs2te+ 20 0 142976 4604 3480 R 1.667 0.225 0:01.65 mpirun 5 root 20 0 0 0 0 S 1.000 0.000 0:01.36 kworker/u8:0 2482 root 20 0 0 0 0 S 1.000 0.000 0:00.86 jbd2/sda1-33 299 root 0 -20 0 0 0 S 0.333 0.000 0:00.13 kworker/2:1H 335 root 0 -20 0 0 0 S 0.333 0.000 0:00.17 kworker/1:1H 535 root 20 0 12140 7268 1456 S 0.333 0.355 0:00.34 haveged 1282 root rt 0 84 123108 97224 S 0.333 6.017 0:01.33 corosync ocfs2test@tb-node2:~>multiple_run.sh -i ens3 -k ~/linux-4.4.21-69.tar.gz -o ~/ocfs2mullog -C hacluster -s pcmk -n tb-node2,tb-node1,tb-node3 -d /dev/sda1 -b 4096 -c 32768 -t multi_mmap /mnt/shared Tests with "-b 4096 -C 32768" Thu Dec 28 15:04:12 CST 2017 multi_mmap..Passed. Runtime 487 seconds. Fixes: 1cce4df04f37 ("ocfs2: do not lock/unlock() inode DLM lock") Signed-off-by: Gang He --- fs/ocfs2/dlmglue.c | 9 + 1 file changed, 9 insertions(+)
Re: [PATCH net-next 5/6] arm64: dts: marvell: mcbin: enable the fourth network interface
On Wed, Dec 27, 2017 at 10:24:01PM +, Russell King - ARM Linux wrote: > On Wed, Dec 27, 2017 at 11:14:45PM +0100, Antoine Tenart wrote: > > This patch enables the fourth network interface on the Marvell > > Macchiatobin. It is configured in the 2500Base-X PHY mode. > > > > Signed-off-by: Antoine Tenart> > --- > > arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts | 8 > > 1 file changed, 8 insertions(+) > > > > diff --git a/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts > > b/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts > > index b3350827ee55..c51efd511324 100644 > > --- a/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts > > +++ b/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts > > @@ -279,6 +279,14 @@ > > phys = <_comphy0 1>; > > }; > > > > +_eth2 { > > + /* CPS Lane 5 */ > > + status = "okay"; > > + phy-mode = "2500base-x"; > > + /* Generic PHY, providing serdes lanes */ > > + phys = <_comphy5 2>; > > +}; > > + > > This is wrong. This lane is connected to a SFP cage which can support > more than just 2500base-X. Tying it in this way to 2500base-X means > that this port does not support conenctions at 1000base-X, despite > that's one of the most popular and more standardised speeds. > Hi Antoine I agree with Russell here. SFP modules are hot pluggable, and support a range of interface modes. You need to query what the SFP module is in order to know how to configure the SERDES interface. The phylink infrastructure does that for you. Andrew
Re: [PATCH net-next 5/6] arm64: dts: marvell: mcbin: enable the fourth network interface
On Wed, Dec 27, 2017 at 10:24:01PM +, Russell King - ARM Linux wrote: > On Wed, Dec 27, 2017 at 11:14:45PM +0100, Antoine Tenart wrote: > > This patch enables the fourth network interface on the Marvell > > Macchiatobin. It is configured in the 2500Base-X PHY mode. > > > > Signed-off-by: Antoine Tenart > > --- > > arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts | 8 > > 1 file changed, 8 insertions(+) > > > > diff --git a/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts > > b/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts > > index b3350827ee55..c51efd511324 100644 > > --- a/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts > > +++ b/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts > > @@ -279,6 +279,14 @@ > > phys = <_comphy0 1>; > > }; > > > > +_eth2 { > > + /* CPS Lane 5 */ > > + status = "okay"; > > + phy-mode = "2500base-x"; > > + /* Generic PHY, providing serdes lanes */ > > + phys = <_comphy5 2>; > > +}; > > + > > This is wrong. This lane is connected to a SFP cage which can support > more than just 2500base-X. Tying it in this way to 2500base-X means > that this port does not support conenctions at 1000base-X, despite > that's one of the most popular and more standardised speeds. > Hi Antoine I agree with Russell here. SFP modules are hot pluggable, and support a range of interface modes. You need to query what the SFP module is in order to know how to configure the SERDES interface. The phylink infrastructure does that for you. Andrew
Re: mmots build error (2)
On Thu, Dec 28, 2017 at 8:32 AM, syzbotwrote: > Hello, > > syzkaller hit the following crash on > 253f36c7c1aee654871b4f0c5e16ee6c396bff9a > git://git.cmpxchg.org/linux-mmots.git/master > compiler: gcc (GCC) 7.1.1 20170620 > .config is attached > Raw console output is attached. > Unfortunately, I don't have any reproducer for this bug yet. > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: > It will help syzbot understand when the bug is fixed. See footer for > details. > If you forward the report, please keep this part and the footer. +mm the build is broken as: failed to run /usr/bin/make [make bzImage -j 32 CC=/syzkaller/gcc/bin/gcc]: exit status 2 scripts/kconfig/conf --silentoldconfig Kconfig CHK include/config/kernel.release CHK include/generated/uapi/linux/version.h UPD include/config/kernel.release CHK include/generated/utsrelease.h UPD include/generated/utsrelease.h CC kernel/bounds.s CHK include/generated/timeconst.h CC scripts/mod/empty.o CC scripts/mod/devicetable-offsets.s MKELF scripts/mod/elfconfig.h HOSTCC scripts/mod/modpost.o HOSTCC scripts/mod/sumversion.o CHK scripts/mod/devicetable-offsets.h HOSTCC scripts/mod/file2alias.o HOSTLD scripts/mod/modpost CHK include/generated/bounds.h CC arch/x86/kernel/asm-offsets.s In file included from ./arch/x86/include/asm/pgtable_types.h:250:0, from ./arch/x86/include/asm/paravirt_types.h:45, from ./arch/x86/include/asm/ptrace.h:92, from ./arch/x86/include/asm/math_emu.h:5, from ./arch/x86/include/asm/processor.h:12, from ./arch/x86/include/asm/cpufeature.h:5, from ./arch/x86/include/asm/thread_info.h:53, from ./include/linux/thread_info.h:38, from ./arch/x86/include/asm/preempt.h:7, from ./include/linux/preempt.h:81, from ./include/linux/spinlock.h:51, from ./include/linux/mmzone.h:8, from ./include/linux/gfp.h:6, from ./include/linux/slab.h:15, from ./include/linux/crypto.h:24, from arch/x86/kernel/asm-offsets.c:9: ./arch/x86/include/asm/pgtable_64_types.h:97:1: error: version control conflict marker in file <<< HEAD ^~~ In file included from ./arch/x86/include/asm/paravirt_types.h:45:0, from ./arch/x86/include/asm/ptrace.h:92, from ./arch/x86/include/asm/math_emu.h:5, from ./arch/x86/include/asm/processor.h:12, from ./arch/x86/include/asm/cpufeature.h:5, from ./arch/x86/include/asm/thread_info.h:53, from ./include/linux/thread_info.h:38, from ./arch/x86/include/asm/preempt.h:7, from ./include/linux/preempt.h:81, from ./include/linux/spinlock.h:51, from ./include/linux/mmzone.h:8, from ./include/linux/gfp.h:6, from ./include/linux/slab.h:15, from ./include/linux/crypto.h:24, from arch/x86/kernel/asm-offsets.c:9: ./arch/x86/include/asm/pgtable_types.h:266:47: warning: data definition has no type or storage class typedef struct pgprot { pgprotval_t pgprot; } pgprot_t; ^~~~ ./arch/x86/include/asm/pgtable_types.h:266:47: error: type defaults to ‘int’ in declaration of ‘pgprot_t’ [-Werror=implicit-int] In file included from ./arch/x86/include/asm/paravirt_types.h:45:0, from ./arch/x86/include/asm/ptrace.h:92, from ./arch/x86/include/asm/math_emu.h:5, from ./arch/x86/include/asm/processor.h:12, from ./arch/x86/include/asm/cpufeature.h:5, from ./arch/x86/include/asm/thread_info.h:53, from ./include/linux/thread_info.h:38, from ./arch/x86/include/asm/preempt.h:7, from ./include/linux/preempt.h:81, from ./include/linux/spinlock.h:51, from ./include/linux/mmzone.h:8, from ./include/linux/gfp.h:6, from ./include/linux/slab.h:15, from ./include/linux/crypto.h:24, from arch/x86/kernel/asm-offsets.c:9: ./arch/x86/include/asm/pgtable_types.h:441:24: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘cachemode2pgprot’ static inline pgprot_t cachemode2pgprot(enum page_cache_mode pcm) ^~~~ ./arch/x86/include/asm/pgtable_types.h:445:53: error: expected declaration specifiers or ‘...’ before ‘pgprot_t’ static inline enum page_cache_mode pgprot2cachemode(pgprot_t pgprot)
Re: mmots build error (2)
On Thu, Dec 28, 2017 at 8:32 AM, syzbot wrote: > Hello, > > syzkaller hit the following crash on > 253f36c7c1aee654871b4f0c5e16ee6c396bff9a > git://git.cmpxchg.org/linux-mmots.git/master > compiler: gcc (GCC) 7.1.1 20170620 > .config is attached > Raw console output is attached. > Unfortunately, I don't have any reproducer for this bug yet. > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: > It will help syzbot understand when the bug is fixed. See footer for > details. > If you forward the report, please keep this part and the footer. +mm the build is broken as: failed to run /usr/bin/make [make bzImage -j 32 CC=/syzkaller/gcc/bin/gcc]: exit status 2 scripts/kconfig/conf --silentoldconfig Kconfig CHK include/config/kernel.release CHK include/generated/uapi/linux/version.h UPD include/config/kernel.release CHK include/generated/utsrelease.h UPD include/generated/utsrelease.h CC kernel/bounds.s CHK include/generated/timeconst.h CC scripts/mod/empty.o CC scripts/mod/devicetable-offsets.s MKELF scripts/mod/elfconfig.h HOSTCC scripts/mod/modpost.o HOSTCC scripts/mod/sumversion.o CHK scripts/mod/devicetable-offsets.h HOSTCC scripts/mod/file2alias.o HOSTLD scripts/mod/modpost CHK include/generated/bounds.h CC arch/x86/kernel/asm-offsets.s In file included from ./arch/x86/include/asm/pgtable_types.h:250:0, from ./arch/x86/include/asm/paravirt_types.h:45, from ./arch/x86/include/asm/ptrace.h:92, from ./arch/x86/include/asm/math_emu.h:5, from ./arch/x86/include/asm/processor.h:12, from ./arch/x86/include/asm/cpufeature.h:5, from ./arch/x86/include/asm/thread_info.h:53, from ./include/linux/thread_info.h:38, from ./arch/x86/include/asm/preempt.h:7, from ./include/linux/preempt.h:81, from ./include/linux/spinlock.h:51, from ./include/linux/mmzone.h:8, from ./include/linux/gfp.h:6, from ./include/linux/slab.h:15, from ./include/linux/crypto.h:24, from arch/x86/kernel/asm-offsets.c:9: ./arch/x86/include/asm/pgtable_64_types.h:97:1: error: version control conflict marker in file <<< HEAD ^~~ In file included from ./arch/x86/include/asm/paravirt_types.h:45:0, from ./arch/x86/include/asm/ptrace.h:92, from ./arch/x86/include/asm/math_emu.h:5, from ./arch/x86/include/asm/processor.h:12, from ./arch/x86/include/asm/cpufeature.h:5, from ./arch/x86/include/asm/thread_info.h:53, from ./include/linux/thread_info.h:38, from ./arch/x86/include/asm/preempt.h:7, from ./include/linux/preempt.h:81, from ./include/linux/spinlock.h:51, from ./include/linux/mmzone.h:8, from ./include/linux/gfp.h:6, from ./include/linux/slab.h:15, from ./include/linux/crypto.h:24, from arch/x86/kernel/asm-offsets.c:9: ./arch/x86/include/asm/pgtable_types.h:266:47: warning: data definition has no type or storage class typedef struct pgprot { pgprotval_t pgprot; } pgprot_t; ^~~~ ./arch/x86/include/asm/pgtable_types.h:266:47: error: type defaults to ‘int’ in declaration of ‘pgprot_t’ [-Werror=implicit-int] In file included from ./arch/x86/include/asm/paravirt_types.h:45:0, from ./arch/x86/include/asm/ptrace.h:92, from ./arch/x86/include/asm/math_emu.h:5, from ./arch/x86/include/asm/processor.h:12, from ./arch/x86/include/asm/cpufeature.h:5, from ./arch/x86/include/asm/thread_info.h:53, from ./include/linux/thread_info.h:38, from ./arch/x86/include/asm/preempt.h:7, from ./include/linux/preempt.h:81, from ./include/linux/spinlock.h:51, from ./include/linux/mmzone.h:8, from ./include/linux/gfp.h:6, from ./include/linux/slab.h:15, from ./include/linux/crypto.h:24, from arch/x86/kernel/asm-offsets.c:9: ./arch/x86/include/asm/pgtable_types.h:441:24: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘cachemode2pgprot’ static inline pgprot_t cachemode2pgprot(enum page_cache_mode pcm) ^~~~ ./arch/x86/include/asm/pgtable_types.h:445:53: error: expected declaration specifiers or ‘...’ before ‘pgprot_t’ static inline enum page_cache_mode pgprot2cachemode(pgprot_t pgprot) ^~~~ ./arch/x86/include/asm/pgtable_types.h:454:24: error: expected ‘=’,
Re: WARNING: kernel stack regs has bad 'bp' value (2)
On Wed, Dec 27, 2017 at 7:29 PM, Dmitry Vyukovwrote: > On Thu, Nov 30, 2017 at 10:17 AM, Eric Biggers wrote: >> On Tue, Nov 28, 2017 at 10:36:01AM -0800, syzbot wrote: >>> WARNING: kernel stack regs at 8801c1e5f468 in syzkaller196611:6199 has >>> bad 'bp' value 0001 >>> unwind stack type:0 next_sp: (null) mask:0x6 graph_idx:0 >>> 8801db4075a8: 8801db407630 (0x8801db407630) >>> 8801db4075b0: 8128a84e (__save_stack_trace+0x6e/0xd0) >>> 8801db4075b8: ... >>> 8801db4075c0: 8801c1e58000 (0x8801c1e58000) >>> 8801db4075c8: 8801c1e6 (0x8801c1e6) >>> 8801db4075d0: ... >>> 8801db4075d8: 0006 (0x6) >>> 8801db4075e0: 8801c1e4e000 (0x8801c1e4e000) >>> 8801db4075e8: 0101 (0x101) >>> 8801db4075f0: ... >>> 8801db4075f8: 8801db4075a8 (0x8801db4075a8) >>> 8801db407600: 8134ff7d (__twofish_enc_blk_3way+0x1b1d/0x1b30) >> >> Looks like the x86_64 "3 way" version of Twofish >> (twofish-x86_64-asm_64-3way.S) >> needs to be updated to not use %rbp. > > > This is what is supposed to be fixed with "crypto: x86/twofish-3way - > Fix %rbp usage", right? Was it merged anywhere? > This is one of top crashers with 15K crashes. #syz fix: crypto: x86/twofish-3way - Fix %rbp usage
Re: WARNING: kernel stack regs has bad 'bp' value (2)
On Wed, Dec 27, 2017 at 7:29 PM, Dmitry Vyukov wrote: > On Thu, Nov 30, 2017 at 10:17 AM, Eric Biggers wrote: >> On Tue, Nov 28, 2017 at 10:36:01AM -0800, syzbot wrote: >>> WARNING: kernel stack regs at 8801c1e5f468 in syzkaller196611:6199 has >>> bad 'bp' value 0001 >>> unwind stack type:0 next_sp: (null) mask:0x6 graph_idx:0 >>> 8801db4075a8: 8801db407630 (0x8801db407630) >>> 8801db4075b0: 8128a84e (__save_stack_trace+0x6e/0xd0) >>> 8801db4075b8: ... >>> 8801db4075c0: 8801c1e58000 (0x8801c1e58000) >>> 8801db4075c8: 8801c1e6 (0x8801c1e6) >>> 8801db4075d0: ... >>> 8801db4075d8: 0006 (0x6) >>> 8801db4075e0: 8801c1e4e000 (0x8801c1e4e000) >>> 8801db4075e8: 0101 (0x101) >>> 8801db4075f0: ... >>> 8801db4075f8: 8801db4075a8 (0x8801db4075a8) >>> 8801db407600: 8134ff7d (__twofish_enc_blk_3way+0x1b1d/0x1b30) >> >> Looks like the x86_64 "3 way" version of Twofish >> (twofish-x86_64-asm_64-3way.S) >> needs to be updated to not use %rbp. > > > This is what is supposed to be fixed with "crypto: x86/twofish-3way - > Fix %rbp usage", right? Was it merged anywhere? > This is one of top crashers with 15K crashes. #syz fix: crypto: x86/twofish-3way - Fix %rbp usage
Re: [PATCH net-next 1/6] phy: add 2.5G SGMII mode to the phy_mode enum
On Wed, Dec 27, 2017 at 11:14:41PM +0100, Antoine Tenart wrote: > This patch adds one more generic PHY mode to the phy_mode enum, to allow > configuring generic PHYs to the 2.5G SGMII mode by using the set_mode > callback. > > Signed-off-by: Antoine Tenart> --- > include/linux/phy/phy.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/include/linux/phy/phy.h b/include/linux/phy/phy.h > index 4f8423a948d5..70459a28f3a1 100644 > --- a/include/linux/phy/phy.h > +++ b/include/linux/phy/phy.h > @@ -28,6 +28,7 @@ enum phy_mode { > PHY_MODE_USB_DEVICE, > PHY_MODE_USB_OTG, > PHY_MODE_SGMII, > + PHY_MODE_SGMII_2_5G, > PHY_MODE_10GKR, > PHY_MODE_UFS_HS_A, > PHY_MODE_UFS_HS_B, Hi Antoine There was a discussion maybe last month about adding 2.5G SGMII. I would prefer 2500SGMII. Putting the number first makes it uniform with the other defines, 1000BASEX, 25000BASEX, 10GKR. Andrew
Re: [PATCH net-next 1/6] phy: add 2.5G SGMII mode to the phy_mode enum
On Wed, Dec 27, 2017 at 11:14:41PM +0100, Antoine Tenart wrote: > This patch adds one more generic PHY mode to the phy_mode enum, to allow > configuring generic PHYs to the 2.5G SGMII mode by using the set_mode > callback. > > Signed-off-by: Antoine Tenart > --- > include/linux/phy/phy.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/include/linux/phy/phy.h b/include/linux/phy/phy.h > index 4f8423a948d5..70459a28f3a1 100644 > --- a/include/linux/phy/phy.h > +++ b/include/linux/phy/phy.h > @@ -28,6 +28,7 @@ enum phy_mode { > PHY_MODE_USB_DEVICE, > PHY_MODE_USB_OTG, > PHY_MODE_SGMII, > + PHY_MODE_SGMII_2_5G, > PHY_MODE_10GKR, > PHY_MODE_UFS_HS_A, > PHY_MODE_UFS_HS_B, Hi Antoine There was a discussion maybe last month about adding 2.5G SGMII. I would prefer 2500SGMII. Putting the number first makes it uniform with the other defines, 1000BASEX, 25000BASEX, 10GKR. Andrew
Re: [PATCH] Staging: ipx: fixed several no space before tabs coding style issues
On Wed, Dec 27, 2017 at 09:25:44PM +, Jianshen Liu wrote: > Fixed several coding style warnings of "please, no space before tabs". > > Signed-off-by: Jianshen Liu> --- > drivers/staging/ipx/af_ipx.c| 56 > - > drivers/staging/ipx/ipx_proc.c | 2 +- > drivers/staging/ipx/ipx_route.c | 6 ++--- > 3 files changed, 32 insertions(+), 32 deletions(-) Please read drivers/staging/ipx/TODO
Re: [PATCH] Staging: ipx: fixed several no space before tabs coding style issues
On Wed, Dec 27, 2017 at 09:25:44PM +, Jianshen Liu wrote: > Fixed several coding style warnings of "please, no space before tabs". > > Signed-off-by: Jianshen Liu > --- > drivers/staging/ipx/af_ipx.c| 56 > - > drivers/staging/ipx/ipx_proc.c | 2 +- > drivers/staging/ipx/ipx_route.c | 6 ++--- > 3 files changed, 32 insertions(+), 32 deletions(-) Please read drivers/staging/ipx/TODO
Re: [PATCH 1/2] nvmem: add driver for JZ4780 efuse
Hi Mathieu, PrasannaKumar, On 27.12.2017 13:27, Mathieu Malaterre wrote: From: PrasannaKumar MuralidharanThis patch brings support for the JZ4780 efuse. Currently it only expose a read only access to the entire 8K bits efuse memory. Tested-by: Mathieu Malaterre Signed-off-by: PrasannaKumar Muralidharan --- + +/* main entry point */ +static int jz4780_efuse_read(void *context, unsigned int offset, + void *val, size_t bytes) +{ + static const int nsegments = sizeof(segments) / sizeof(*segments); + struct jz4780_efuse *efuse = context; + char buf[32]; + char *cur = val; + int i; + /* PM recommends read/write each segment separately */ + for (i = 0; i < nsegments; ++i) { + unsigned int *segment = segments[i]; + unsigned int lpos = segment[0]; + unsigned int buflen = segment[1] / 8; + unsigned int ncount = buflen / 32; + unsigned int remain = buflen % 32; + int j; This doesn't look right, as offset & bytes are completely ignored. This means it will return data from an offset other than requested and may also overrun the provided output buffer? + /* EFUSE can read or write maximum 256bit in each time */ + for (j = 0; j < ncount ; ++j) { + jz4780_efuse_read_32bytes(efuse, buf, lpos); + memcpy(cur, buf, sizeof(buf)); + cur += sizeof(buf); + lpos += sizeof(buf); + } + if (remain) { + jz4780_efuse_read_32bytes(efuse, buf, lpos); + memcpy(cur, buf, remain); + cur += remain; + } + } + + return 0; +} Marcin
Re: [PATCH 1/2] nvmem: add driver for JZ4780 efuse
Hi Mathieu, PrasannaKumar, On 27.12.2017 13:27, Mathieu Malaterre wrote: From: PrasannaKumar Muralidharan This patch brings support for the JZ4780 efuse. Currently it only expose a read only access to the entire 8K bits efuse memory. Tested-by: Mathieu Malaterre Signed-off-by: PrasannaKumar Muralidharan --- + +/* main entry point */ +static int jz4780_efuse_read(void *context, unsigned int offset, + void *val, size_t bytes) +{ + static const int nsegments = sizeof(segments) / sizeof(*segments); + struct jz4780_efuse *efuse = context; + char buf[32]; + char *cur = val; + int i; + /* PM recommends read/write each segment separately */ + for (i = 0; i < nsegments; ++i) { + unsigned int *segment = segments[i]; + unsigned int lpos = segment[0]; + unsigned int buflen = segment[1] / 8; + unsigned int ncount = buflen / 32; + unsigned int remain = buflen % 32; + int j; This doesn't look right, as offset & bytes are completely ignored. This means it will return data from an offset other than requested and may also overrun the provided output buffer? + /* EFUSE can read or write maximum 256bit in each time */ + for (j = 0; j < ncount ; ++j) { + jz4780_efuse_read_32bytes(efuse, buf, lpos); + memcpy(cur, buf, sizeof(buf)); + cur += sizeof(buf); + lpos += sizeof(buf); + } + if (remain) { + jz4780_efuse_read_32bytes(efuse, buf, lpos); + memcpy(cur, buf, remain); + cur += remain; + } + } + + return 0; +} Marcin
Re: [PATCH] crypto: stm32 - Use standard CONFIG name
On Wed, Dec 20, 2017 at 06:19:32PM +, Corentin Labbe wrote: > All hardware crypto devices have their CONFIG names using the following > convention: > CRYPTO_DEV_name_algo > > This patch apply this conventions on STM32 CONFIG names. > > Signed-off-by: Corentin LabbePatch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH] crypto: stm32 - Use standard CONFIG name
On Wed, Dec 20, 2017 at 06:19:32PM +, Corentin Labbe wrote: > All hardware crypto devices have their CONFIG names using the following > convention: > CRYPTO_DEV_name_algo > > This patch apply this conventions on STM32 CONFIG names. > > Signed-off-by: Corentin Labbe Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH v2 0/6] crypto: tcrypt: fix and add multi buf speed tests
On Sun, Dec 17, 2017 at 08:28:59AM +, Gilad Ben-Yossef wrote: > The performance of some crypto tfm providers is affected by > the amount of parallelism possible with the processing. > > We already had some support for speed test of multiple concurrent > requests, dubbed multi buffer, in ahash speed tests. > > > This patch set extends said support and add similar support for skcipher > and AEAD, as well as fixes some odd bugs discovered along the way. > > > It is noted that it is possible to consolidate some of the none multi > buffer speed test code better, but given that tcrypt as a whole is a > developer testing harness rather than production code, the value of > this activity seems questionable. Do let me know if you disagree. > > Signed-off-by: Gilad Ben-YossefAll applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH] crypto: x86/twofish-3way - Fix %rbp usage
On Mon, Dec 18, 2017 at 04:40:26PM -0800, Eric Biggers wrote: > From: Eric Biggers> > Using %rbp as a temporary register breaks frame pointer convention and > breaks stack traces when unwinding from an interrupt in the crypto code. > > In twofish-3way, we can't simply replace %rbp with another register > because there are none available. Instead, we use the stack to hold the > values that %rbp, %r11, and %r12 were holding previously. Each of these > values represents the half of the output from the previous Feistel round > that is being passed on unchanged to the following round. They are only > used once per round, when they are exchanged with %rax, %rbx, and %rcx. > > As a result, we free up 3 registers (one per block) and can reassign > them so that %rbp is not used, and additionally %r14 and %r15 are not > used so they do not need to be saved/restored. > > There may be a small overhead caused by replacing 'xchg REG, REG' with > the needed sequence 'mov MEM, REG; mov REG, MEM; mov REG, REG' once per > round. But, counterintuitively, when I tested "ctr-twofish-3way" on a > Haswell processor, the new version was actually about 2% faster. > (Perhaps 'xchg' is not as well optimized as plain moves.) > > Reported-by: syzbot > Signed-off-by: Eric Biggers Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH v2 0/6] crypto: tcrypt: fix and add multi buf speed tests
On Sun, Dec 17, 2017 at 08:28:59AM +, Gilad Ben-Yossef wrote: > The performance of some crypto tfm providers is affected by > the amount of parallelism possible with the processing. > > We already had some support for speed test of multiple concurrent > requests, dubbed multi buffer, in ahash speed tests. > > > This patch set extends said support and add similar support for skcipher > and AEAD, as well as fixes some odd bugs discovered along the way. > > > It is noted that it is possible to consolidate some of the none multi > buffer speed test code better, but given that tcrypt as a whole is a > developer testing harness rather than production code, the value of > this activity seems questionable. Do let me know if you disagree. > > Signed-off-by: Gilad Ben-Yossef All applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH] crypto: x86/twofish-3way - Fix %rbp usage
On Mon, Dec 18, 2017 at 04:40:26PM -0800, Eric Biggers wrote: > From: Eric Biggers > > Using %rbp as a temporary register breaks frame pointer convention and > breaks stack traces when unwinding from an interrupt in the crypto code. > > In twofish-3way, we can't simply replace %rbp with another register > because there are none available. Instead, we use the stack to hold the > values that %rbp, %r11, and %r12 were holding previously. Each of these > values represents the half of the output from the previous Feistel round > that is being passed on unchanged to the following round. They are only > used once per round, when they are exchanged with %rax, %rbx, and %rcx. > > As a result, we free up 3 registers (one per block) and can reassign > them so that %rbp is not used, and additionally %r14 and %r15 are not > used so they do not need to be saved/restored. > > There may be a small overhead caused by replacing 'xchg REG, REG' with > the needed sequence 'mov MEM, REG; mov REG, MEM; mov REG, REG' once per > round. But, counterintuitively, when I tested "ctr-twofish-3way" on a > Haswell processor, the new version was actually about 2% faster. > (Perhaps 'xchg' is not as well optimized as plain moves.) > > Reported-by: syzbot > Signed-off-by: Eric Biggers Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[PATCH net-next v9 1/2] dt-bindings: net: add DT bindings for Socionext UniPhier AVE
DT bindings for the AVE ethernet controller found on Socionext's UniPhier platforms. Signed-off-by: Kunihiko HayashiSigned-off-by: Jassi Brar Acked-by: Rob Herring Reviewed-by: Florian Fainelli --- .../bindings/net/socionext,uniphier-ave4.txt | 48 ++ 1 file changed, 48 insertions(+) create mode 100644 Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt diff --git a/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt new file mode 100644 index 000..270ea4e --- /dev/null +++ b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt @@ -0,0 +1,48 @@ +* Socionext AVE ethernet controller + +This describes the devicetree bindings for AVE ethernet controller +implemented on Socionext UniPhier SoCs. + +Required properties: + - compatible: Should be + - "socionext,uniphier-pro4-ave4" : for Pro4 SoC + - "socionext,uniphier-pxs2-ave4" : for PXs2 SoC + - "socionext,uniphier-ld11-ave4" : for LD11 SoC + - "socionext,uniphier-ld20-ave4" : for LD20 SoC + - reg: Address where registers are mapped and size of region. + - interrupts: Should contain the MAC interrupt. + - phy-mode: See ethernet.txt in the same directory. Allow to choose + "rgmii", "rmii", or "mii" according to the PHY. + - phy-handle: Should point to the external phy device. + See ethernet.txt file in the same directory. + - clocks: A phandle to the clock for the MAC. + +Optional properties: + - resets: A phandle to the reset control for the MAC. + - local-mac-address: See ethernet.txt in the same directory. + +Required subnode: + - mdio: A container for child nodes representing phy nodes. + See phy.txt in the same directory. + +Example: + + ether: ethernet@6500 { + compatible = "socionext,uniphier-ld20-ave4"; + reg = <0x6500 0x8500>; + interrupts = <0 66 4>; + phy-mode = "rgmii"; + phy-handle = <>; + clocks = <_clk 6>; + resets = <_rst 6>; + local-mac-address = [00 00 00 00 00 00]; + + mdio { + #address-cells = <1>; + #size-cells = <0>; + + ethphy: ethphy@1 { + reg = <1>; + }; + }; + }; -- 2.7.4
[PATCH net-next v9 1/2] dt-bindings: net: add DT bindings for Socionext UniPhier AVE
DT bindings for the AVE ethernet controller found on Socionext's UniPhier platforms. Signed-off-by: Kunihiko Hayashi Signed-off-by: Jassi Brar Acked-by: Rob Herring Reviewed-by: Florian Fainelli --- .../bindings/net/socionext,uniphier-ave4.txt | 48 ++ 1 file changed, 48 insertions(+) create mode 100644 Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt diff --git a/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt new file mode 100644 index 000..270ea4e --- /dev/null +++ b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt @@ -0,0 +1,48 @@ +* Socionext AVE ethernet controller + +This describes the devicetree bindings for AVE ethernet controller +implemented on Socionext UniPhier SoCs. + +Required properties: + - compatible: Should be + - "socionext,uniphier-pro4-ave4" : for Pro4 SoC + - "socionext,uniphier-pxs2-ave4" : for PXs2 SoC + - "socionext,uniphier-ld11-ave4" : for LD11 SoC + - "socionext,uniphier-ld20-ave4" : for LD20 SoC + - reg: Address where registers are mapped and size of region. + - interrupts: Should contain the MAC interrupt. + - phy-mode: See ethernet.txt in the same directory. Allow to choose + "rgmii", "rmii", or "mii" according to the PHY. + - phy-handle: Should point to the external phy device. + See ethernet.txt file in the same directory. + - clocks: A phandle to the clock for the MAC. + +Optional properties: + - resets: A phandle to the reset control for the MAC. + - local-mac-address: See ethernet.txt in the same directory. + +Required subnode: + - mdio: A container for child nodes representing phy nodes. + See phy.txt in the same directory. + +Example: + + ether: ethernet@6500 { + compatible = "socionext,uniphier-ld20-ave4"; + reg = <0x6500 0x8500>; + interrupts = <0 66 4>; + phy-mode = "rgmii"; + phy-handle = <>; + clocks = <_clk 6>; + resets = <_rst 6>; + local-mac-address = [00 00 00 00 00 00]; + + mdio { + #address-cells = <1>; + #size-cells = <0>; + + ethphy: ethphy@1 { + reg = <1>; + }; + }; + }; -- 2.7.4
[PATCH net-next v9 2/2] net: ethernet: socionext: add AVE ethernet driver
The UniPhier platform from Socionext provides the AVE ethernet controller that includes MAC and MDIO bus supporting RGMII/RMII modes. The controller is named AVE. Signed-off-by: Kunihiko HayashiSigned-off-by: Jassi Brar Reviewed-by: Andrew Lunn --- drivers/net/ethernet/Kconfig |1 + drivers/net/ethernet/Makefile|1 + drivers/net/ethernet/socionext/Kconfig | 22 + drivers/net/ethernet/socionext/Makefile |5 + drivers/net/ethernet/socionext/sni_ave.c | 1736 ++ 5 files changed, 1765 insertions(+) create mode 100644 drivers/net/ethernet/socionext/Kconfig create mode 100644 drivers/net/ethernet/socionext/Makefile create mode 100644 drivers/net/ethernet/socionext/sni_ave.c diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig index c604213..d50519e 100644 --- a/drivers/net/ethernet/Kconfig +++ b/drivers/net/ethernet/Kconfig @@ -170,6 +170,7 @@ source "drivers/net/ethernet/sis/Kconfig" source "drivers/net/ethernet/sfc/Kconfig" source "drivers/net/ethernet/sgi/Kconfig" source "drivers/net/ethernet/smsc/Kconfig" +source "drivers/net/ethernet/socionext/Kconfig" source "drivers/net/ethernet/stmicro/Kconfig" source "drivers/net/ethernet/sun/Kconfig" source "drivers/net/ethernet/tehuti/Kconfig" diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile index 39f62733..6cf5ade 100644 --- a/drivers/net/ethernet/Makefile +++ b/drivers/net/ethernet/Makefile @@ -82,6 +82,7 @@ obj-$(CONFIG_SFC) += sfc/ obj-$(CONFIG_SFC_FALCON) += sfc/falcon/ obj-$(CONFIG_NET_VENDOR_SGI) += sgi/ obj-$(CONFIG_NET_VENDOR_SMSC) += smsc/ +obj-$(CONFIG_NET_VENDOR_SOCIONEXT) += socionext/ obj-$(CONFIG_NET_VENDOR_STMICRO) += stmicro/ obj-$(CONFIG_NET_VENDOR_SUN) += sun/ obj-$(CONFIG_NET_VENDOR_TEHUTI) += tehuti/ diff --git a/drivers/net/ethernet/socionext/Kconfig b/drivers/net/ethernet/socionext/Kconfig new file mode 100644 index 000..3a1829e --- /dev/null +++ b/drivers/net/ethernet/socionext/Kconfig @@ -0,0 +1,22 @@ +config NET_VENDOR_SOCIONEXT + bool "Socionext ethernet drivers" + default y + ---help--- + Option to select ethernet drivers for Socionext platforms. + + Note that the answer to this question doesn't directly affect the + kernel: saying N will just cause the configurator to skip all + the questions about Socionext devices. If you say Y, you will be asked + for your specific card in the following questions. + +if NET_VENDOR_SOCIONEXT + +config SNI_AVE + tristate "Socionext AVE ethernet support" + depends on (ARCH_UNIPHIER || COMPILE_TEST) && OF + select PHYLIB + ---help--- + Driver for gigabit ethernet MACs, called AVE, in the + Socionext UniPhier family. + +endif #NET_VENDOR_SOCIONEXT diff --git a/drivers/net/ethernet/socionext/Makefile b/drivers/net/ethernet/socionext/Makefile new file mode 100644 index 000..ab83df6 --- /dev/null +++ b/drivers/net/ethernet/socionext/Makefile @@ -0,0 +1,5 @@ +# SPDX-License-Identifier: GPL-2.0 +# +# Makefile for all ethernet ip drivers on Socionext platforms +# +obj-$(CONFIG_SNI_AVE) += sni_ave.o diff --git a/drivers/net/ethernet/socionext/sni_ave.c b/drivers/net/ethernet/socionext/sni_ave.c new file mode 100644 index 000..111e7ca --- /dev/null +++ b/drivers/net/ethernet/socionext/sni_ave.c @@ -0,0 +1,1736 @@ +// SPDX-License-Identifier: GPL-2.0 +/** + * sni_ave.c - Socionext UniPhier AVE ethernet driver + * Copyright 2014 Panasonic Corporation + * Copyright 2015-2017 Socionext Inc. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* General Register Group */ +#define AVE_IDR0x000 /* ID */ +#define AVE_VR 0x004 /* Version */ +#define AVE_GRR0x008 /* Global Reset */ +#define AVE_CFGR 0x00c /* Configuration */ + +/* Interrupt Register Group */ +#define AVE_GIMR 0x100 /* Global Interrupt Mask */ +#define AVE_GISR 0x104 /* Global Interrupt Status */ + +/* MAC Register Group */ +#define AVE_TXCR 0x200 /* TX Setup */ +#define AVE_RXCR 0x204 /* RX Setup */ +#define AVE_RXMAC1R0x208 /* MAC address (lower) */ +#define AVE_RXMAC2R0x20c /* MAC address (upper) */ +#define AVE_MDIOCTR0x214 /* MDIO Control */ +#define AVE_MDIOAR 0x218 /* MDIO Address */ +#define AVE_MDIOWDR0x21c /* MDIO Data */ +#define AVE_MDIOSR 0x220 /* MDIO Status */ +#define AVE_MDIORDR0x224 /* MDIO Rd Data */ + +/* Descriptor Control Register Group */ +#define AVE_DESCC 0x300 /* Descriptor Control */ +#define AVE_TXDC 0x304
[PATCH net-next v9 2/2] net: ethernet: socionext: add AVE ethernet driver
The UniPhier platform from Socionext provides the AVE ethernet controller that includes MAC and MDIO bus supporting RGMII/RMII modes. The controller is named AVE. Signed-off-by: Kunihiko Hayashi Signed-off-by: Jassi Brar Reviewed-by: Andrew Lunn --- drivers/net/ethernet/Kconfig |1 + drivers/net/ethernet/Makefile|1 + drivers/net/ethernet/socionext/Kconfig | 22 + drivers/net/ethernet/socionext/Makefile |5 + drivers/net/ethernet/socionext/sni_ave.c | 1736 ++ 5 files changed, 1765 insertions(+) create mode 100644 drivers/net/ethernet/socionext/Kconfig create mode 100644 drivers/net/ethernet/socionext/Makefile create mode 100644 drivers/net/ethernet/socionext/sni_ave.c diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig index c604213..d50519e 100644 --- a/drivers/net/ethernet/Kconfig +++ b/drivers/net/ethernet/Kconfig @@ -170,6 +170,7 @@ source "drivers/net/ethernet/sis/Kconfig" source "drivers/net/ethernet/sfc/Kconfig" source "drivers/net/ethernet/sgi/Kconfig" source "drivers/net/ethernet/smsc/Kconfig" +source "drivers/net/ethernet/socionext/Kconfig" source "drivers/net/ethernet/stmicro/Kconfig" source "drivers/net/ethernet/sun/Kconfig" source "drivers/net/ethernet/tehuti/Kconfig" diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile index 39f62733..6cf5ade 100644 --- a/drivers/net/ethernet/Makefile +++ b/drivers/net/ethernet/Makefile @@ -82,6 +82,7 @@ obj-$(CONFIG_SFC) += sfc/ obj-$(CONFIG_SFC_FALCON) += sfc/falcon/ obj-$(CONFIG_NET_VENDOR_SGI) += sgi/ obj-$(CONFIG_NET_VENDOR_SMSC) += smsc/ +obj-$(CONFIG_NET_VENDOR_SOCIONEXT) += socionext/ obj-$(CONFIG_NET_VENDOR_STMICRO) += stmicro/ obj-$(CONFIG_NET_VENDOR_SUN) += sun/ obj-$(CONFIG_NET_VENDOR_TEHUTI) += tehuti/ diff --git a/drivers/net/ethernet/socionext/Kconfig b/drivers/net/ethernet/socionext/Kconfig new file mode 100644 index 000..3a1829e --- /dev/null +++ b/drivers/net/ethernet/socionext/Kconfig @@ -0,0 +1,22 @@ +config NET_VENDOR_SOCIONEXT + bool "Socionext ethernet drivers" + default y + ---help--- + Option to select ethernet drivers for Socionext platforms. + + Note that the answer to this question doesn't directly affect the + kernel: saying N will just cause the configurator to skip all + the questions about Socionext devices. If you say Y, you will be asked + for your specific card in the following questions. + +if NET_VENDOR_SOCIONEXT + +config SNI_AVE + tristate "Socionext AVE ethernet support" + depends on (ARCH_UNIPHIER || COMPILE_TEST) && OF + select PHYLIB + ---help--- + Driver for gigabit ethernet MACs, called AVE, in the + Socionext UniPhier family. + +endif #NET_VENDOR_SOCIONEXT diff --git a/drivers/net/ethernet/socionext/Makefile b/drivers/net/ethernet/socionext/Makefile new file mode 100644 index 000..ab83df6 --- /dev/null +++ b/drivers/net/ethernet/socionext/Makefile @@ -0,0 +1,5 @@ +# SPDX-License-Identifier: GPL-2.0 +# +# Makefile for all ethernet ip drivers on Socionext platforms +# +obj-$(CONFIG_SNI_AVE) += sni_ave.o diff --git a/drivers/net/ethernet/socionext/sni_ave.c b/drivers/net/ethernet/socionext/sni_ave.c new file mode 100644 index 000..111e7ca --- /dev/null +++ b/drivers/net/ethernet/socionext/sni_ave.c @@ -0,0 +1,1736 @@ +// SPDX-License-Identifier: GPL-2.0 +/** + * sni_ave.c - Socionext UniPhier AVE ethernet driver + * Copyright 2014 Panasonic Corporation + * Copyright 2015-2017 Socionext Inc. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* General Register Group */ +#define AVE_IDR0x000 /* ID */ +#define AVE_VR 0x004 /* Version */ +#define AVE_GRR0x008 /* Global Reset */ +#define AVE_CFGR 0x00c /* Configuration */ + +/* Interrupt Register Group */ +#define AVE_GIMR 0x100 /* Global Interrupt Mask */ +#define AVE_GISR 0x104 /* Global Interrupt Status */ + +/* MAC Register Group */ +#define AVE_TXCR 0x200 /* TX Setup */ +#define AVE_RXCR 0x204 /* RX Setup */ +#define AVE_RXMAC1R0x208 /* MAC address (lower) */ +#define AVE_RXMAC2R0x20c /* MAC address (upper) */ +#define AVE_MDIOCTR0x214 /* MDIO Control */ +#define AVE_MDIOAR 0x218 /* MDIO Address */ +#define AVE_MDIOWDR0x21c /* MDIO Data */ +#define AVE_MDIOSR 0x220 /* MDIO Status */ +#define AVE_MDIORDR0x224 /* MDIO Rd Data */ + +/* Descriptor Control Register Group */ +#define AVE_DESCC 0x300 /* Descriptor Control */ +#define AVE_TXDC 0x304 /* TX Descriptor Configuration */ +#define AVE_RXDC0 0x308 /*
[PATCH net-next v9 0/2] add UniPhier AVE ethernet support
This series adds support for Socionext AVE ethernet controller implemented on UniPhier SoCs. This driver supports RGMII/RMII modes. v8: https://www.spinics.net/lists/netdev/msg474374.html The PHY patch included in v1 has already separated in: http://www.spinics.net/lists/netdev/msg454595.html Changes since v8: - move operators at the beginning of the line to the end of the previous line - dt-bindings: add blank lines before mdio and phy subnodes Changes since v7: - dt-bindings: fix mdio subnode description Changes since v6: - sort the order of local variables from longest to shortest line - fix ave_probe() which calls register_netdev() at the end of initialization - dt-bindings: remove phy node descriptions in mdio node Changes since v5: - replace license boilerplate with SPDX Identifier - remove inline directives and an unused function Changes since v4: - fix larger integer warning on AVE_PFMBYTE_MASK0 Changes since v3: - remove checking dma address and use dma_set_mask() to restirct address - replace ave_mdio_busywait() with read_poll_timeout() - replace functions to access to registers with readl/writel() directly - replace a function to access to macaddr with ave_hw_write_macaddr() - change return value of ave_dma_map() to error value - move mdiobus_unregister() from ave_remove() to ave_uninit() - eliminate else block at the end of ave_dma_map() - add mask definitions for packet filter - sort bitmap definitions in descending order - add error check to some functions - rename and sort functions to clear sub-categories - fix error value consistency - remove unneeded initializers - change type of constant arrays Changes since v2: - replace clk_get() with devm_clk_get() - replace reset_control_get() with devm_reset_control_get_optional_shared() - add error return when the error occurs on the above *_get functions - sort soc data and compatible strings - remove clearly obvious comments - modify dt-bindings document consistent with these modifications Changes since v1: - add/remove devicetree properties and sub-node - remove "internal-phy-interrupt" and "desc-bits" property - add SoC data structures based on compatible strings - add node operation to apply "mdio" sub-node - add support for features - add support for {get,set}_pauseparam and pause frame operations - add support for ndo_get_stats64 instead of ndo_get_stats - replace with desiable functions - replace check for valid phy_mode with phy_interface{_mode}_is_rgmii() - replace phy attach message with phy_attached_info() - replace 32bit operation with {upper,lower}_32_bits() on ave_wdesc_addr() - replace nway_reset and get_link with generic functions - move operations to proper functions - move phy_start_aneg() to ndo_open, and remove unnecessary PHY interrupt operations See http://www.spinics.net/lists/netdev/msg454590.html - move irq initialization and descriptor memory allocation to ndo_open - move initialization of reset and clock and mdiobus to ndo_init - fix skbuffer operations - fix skb alignment operations and add Rx buffer adjustment for descriptor See http://www.spinics.net/lists/netdev/msg456014.html - add error returns when dma_map_single() failed - clean up code structures - clean up wait-loop and wake-queue conditions - add ave_wdesc_addr() and offset definitions - add ave_macaddr_init() to clean up mac-address operation - fix checking whether Tx entry is not enough - fix supported features of phydev - add necessary free/disable operations - add phydev check on ave_{get,set}_wol() - remove netif_carrier functions, phydev initializer, and Tx budget check - change obsolate codes - replace ndev->{base_addr,irq} with the members of ave_private - rename goto labels and mask definitions, and remove unused codes Kunihiko Hayashi (2): dt-bindings: net: add DT bindings for Socionext UniPhier AVE net: ethernet: socionext: add AVE ethernet driver .../bindings/net/socionext,uniphier-ave4.txt | 48 + drivers/net/ethernet/Kconfig |1 + drivers/net/ethernet/Makefile |1 + drivers/net/ethernet/socionext/Kconfig | 22 + drivers/net/ethernet/socionext/Makefile|5 + drivers/net/ethernet/socionext/sni_ave.c | 1736 6 files changed, 1813 insertions(+) create mode 100644 Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt create mode 100644 drivers/net/ethernet/socionext/Kconfig create mode 100644 drivers/net/ethernet/socionext/Makefile create mode 100644 drivers/net/ethernet/socionext/sni_ave.c -- 2.7.4
[PATCH net-next v9 0/2] add UniPhier AVE ethernet support
This series adds support for Socionext AVE ethernet controller implemented on UniPhier SoCs. This driver supports RGMII/RMII modes. v8: https://www.spinics.net/lists/netdev/msg474374.html The PHY patch included in v1 has already separated in: http://www.spinics.net/lists/netdev/msg454595.html Changes since v8: - move operators at the beginning of the line to the end of the previous line - dt-bindings: add blank lines before mdio and phy subnodes Changes since v7: - dt-bindings: fix mdio subnode description Changes since v6: - sort the order of local variables from longest to shortest line - fix ave_probe() which calls register_netdev() at the end of initialization - dt-bindings: remove phy node descriptions in mdio node Changes since v5: - replace license boilerplate with SPDX Identifier - remove inline directives and an unused function Changes since v4: - fix larger integer warning on AVE_PFMBYTE_MASK0 Changes since v3: - remove checking dma address and use dma_set_mask() to restirct address - replace ave_mdio_busywait() with read_poll_timeout() - replace functions to access to registers with readl/writel() directly - replace a function to access to macaddr with ave_hw_write_macaddr() - change return value of ave_dma_map() to error value - move mdiobus_unregister() from ave_remove() to ave_uninit() - eliminate else block at the end of ave_dma_map() - add mask definitions for packet filter - sort bitmap definitions in descending order - add error check to some functions - rename and sort functions to clear sub-categories - fix error value consistency - remove unneeded initializers - change type of constant arrays Changes since v2: - replace clk_get() with devm_clk_get() - replace reset_control_get() with devm_reset_control_get_optional_shared() - add error return when the error occurs on the above *_get functions - sort soc data and compatible strings - remove clearly obvious comments - modify dt-bindings document consistent with these modifications Changes since v1: - add/remove devicetree properties and sub-node - remove "internal-phy-interrupt" and "desc-bits" property - add SoC data structures based on compatible strings - add node operation to apply "mdio" sub-node - add support for features - add support for {get,set}_pauseparam and pause frame operations - add support for ndo_get_stats64 instead of ndo_get_stats - replace with desiable functions - replace check for valid phy_mode with phy_interface{_mode}_is_rgmii() - replace phy attach message with phy_attached_info() - replace 32bit operation with {upper,lower}_32_bits() on ave_wdesc_addr() - replace nway_reset and get_link with generic functions - move operations to proper functions - move phy_start_aneg() to ndo_open, and remove unnecessary PHY interrupt operations See http://www.spinics.net/lists/netdev/msg454590.html - move irq initialization and descriptor memory allocation to ndo_open - move initialization of reset and clock and mdiobus to ndo_init - fix skbuffer operations - fix skb alignment operations and add Rx buffer adjustment for descriptor See http://www.spinics.net/lists/netdev/msg456014.html - add error returns when dma_map_single() failed - clean up code structures - clean up wait-loop and wake-queue conditions - add ave_wdesc_addr() and offset definitions - add ave_macaddr_init() to clean up mac-address operation - fix checking whether Tx entry is not enough - fix supported features of phydev - add necessary free/disable operations - add phydev check on ave_{get,set}_wol() - remove netif_carrier functions, phydev initializer, and Tx budget check - change obsolate codes - replace ndev->{base_addr,irq} with the members of ave_private - rename goto labels and mask definitions, and remove unused codes Kunihiko Hayashi (2): dt-bindings: net: add DT bindings for Socionext UniPhier AVE net: ethernet: socionext: add AVE ethernet driver .../bindings/net/socionext,uniphier-ave4.txt | 48 + drivers/net/ethernet/Kconfig |1 + drivers/net/ethernet/Makefile |1 + drivers/net/ethernet/socionext/Kconfig | 22 + drivers/net/ethernet/socionext/Makefile|5 + drivers/net/ethernet/socionext/sni_ave.c | 1736 6 files changed, 1813 insertions(+) create mode 100644 Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt create mode 100644 drivers/net/ethernet/socionext/Kconfig create mode 100644 drivers/net/ethernet/socionext/Makefile create mode 100644 drivers/net/ethernet/socionext/sni_ave.c -- 2.7.4
Re: [PATCH] padata: add SPDX identifier
Cheah Kok Cheongwrote: > Add SPDX license identifier according to the type of license text found > in the file. > > Cc: Philippe Ombredanne > Signed-off-by: Cheah Kok Cheong > --- > kernel/padata.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/kernel/padata.c b/kernel/padata.c > index e265953..eb9a9d9 100644 > --- a/kernel/padata.c > +++ b/kernel/padata.c > @@ -1,3 +1,4 @@ > +// SPDX-License-Identifier: GPL-2.0 > /* > * padata.c - generic interface to process data streams in parallel > * Steffen, are you OK with this patch? Thanks, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH] padata: add SPDX identifier
Cheah Kok Cheong wrote: > Add SPDX license identifier according to the type of license text found > in the file. > > Cc: Philippe Ombredanne > Signed-off-by: Cheah Kok Cheong > --- > kernel/padata.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/kernel/padata.c b/kernel/padata.c > index e265953..eb9a9d9 100644 > --- a/kernel/padata.c > +++ b/kernel/padata.c > @@ -1,3 +1,4 @@ > +// SPDX-License-Identifier: GPL-2.0 > /* > * padata.c - generic interface to process data streams in parallel > * Steffen, are you OK with this patch? Thanks, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
Hello, On (12/21/17 23:19), Steven Rostedt wrote: [..] > > I wrote this before but this isn't a theoretical problem. We see > > these stalls a lot. Preemption isn't enabled to begin with. Memory > > pressure is high and OOM triggers and printk starts printing out OOM > > warnings; then, a network packet comes in which triggers allocations > > in the network layer, which fails due to memory pressure, which then > > generates memory allocation failure messages which then generates > > netconsole packets which then tries to allocate more memory and so on. > > It doesn't matter if preemption is enabled or not. The hand off should > happen either way. preemption does matter. it matters so much, that it makes the first thing your patch depends on to be questionable - if CPUA stuck in console_unlock() printing other CPU's messages, it's because there are currently printk()-s happening on CPUB-CPUZ. which is not always true. with preemption enabled CPUA can stuck in console_unlock() because printk()-s from CPUB-CPUZ could have happened seconds or even minutes ago. I have demonstrated it with the traces; it didn't convenience you. OK, I sat down and went trough Tetsuo's reports starting from 2016: https://marc.info/?l=linux-kernel=151375384500555 and Tetsuo has reported several times that printing CPU can sleep for minutes with console_sem being locked; while other CPUs are happy to printk()->log_store() as much as they want. and that's why I believe that that "The hand off should happen either way" is dangerous, especially when we hand off from a preemptible context to an atomic context. you don't think that this is a problem, because your patch requires logbuf to be small enough for any atomic context (irq, under spin_lock, etc) to be able to print out all the pending messages to all registered consoles [we print to consoles sequentially] within watchdog_threshold seconds (10 seconds by default). who does adjust the size of the logbuf in a way that console_unlock() can print the entire logbuf under 10 seconds? > > It's just that there's no one else to give that flushing duty too, so > > the pingpoinging that your patch implements can't really help > > anything. > > > > You argue that it isn't made worse by your patch, which may be true > > but your patch doesn't solve actual problems and is most likely > > unnecessary complication which gets in the way for the actual > > solution. It's a weird argument to push an approach which is > > fundamentally broken. Let's please stop that. > > BULLSHIT! > > It's not a complex solution, and coming from the cgroup and workqueue > maintainer, that's a pretty ironic comment. > > It has already been proven that it can solve problems: > > http://lkml.kernel.org/r/20171219143147.gb15...@dhcp22.suse.cz and Tetsuo also said that his test does not cover all the scenarios; and he has sent us all a pretty clear warning: : Therefore, while it is true that any approach would survive my : environment, it is dangerous to assume that any approach is safe : for my customer's enterprise servers. https://marc.info/?l=linux-kernel=151377161705573 when it comes to lockups, printk() design is so flexible, that one can justify nearly every patch no matter how many regressions it introduces, by simply saying: "but the same lockup scenario could have happened even before my patch. it's just printk() was designed this way. you were lucky enough not to hit the problem before; now you are less lucky". been there, done that. it's a trap. > You don't think handing off printks to an offloaded thread isn't more > complex nor can it cause even more issues (like more likely to lose > relevant information on kernel crashes)? printk_kthread *used* to be way too independent. basically what we had before was bad_function() printk() vprintk_emit() { if (oops_in_progress) can_offload_to_printk = false; if (can_offload_to_printk) wake_up(printk_kthread); else if (console_trylock()) console_unlock(); } the bad things: - first, we do not take into account the fact that printk_kthread can already be active. - second, we do not take into account the fact that printk_kthread can be way-way behind - a huge gap between `console_seq' and `log_next_seq'. - third, we do not take into account the fact that printk_kthread can be preempted under console_sem. so, IOW, the CPU which was in trouble declared the "oh, we should not offload to printk kthread" emergency only for itself, basically. the CPU which printk_kthread was running on or sleeping on [preemption] did not care, neither did printk_kthread. what we have now: - printk_kthread cannot be preempted under console_sem. if it acquires the console_sem it stays RUNNING, printing the messages. - printk_kthread does check for emergency on other CPUs. right
Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
Hello, On (12/21/17 23:19), Steven Rostedt wrote: [..] > > I wrote this before but this isn't a theoretical problem. We see > > these stalls a lot. Preemption isn't enabled to begin with. Memory > > pressure is high and OOM triggers and printk starts printing out OOM > > warnings; then, a network packet comes in which triggers allocations > > in the network layer, which fails due to memory pressure, which then > > generates memory allocation failure messages which then generates > > netconsole packets which then tries to allocate more memory and so on. > > It doesn't matter if preemption is enabled or not. The hand off should > happen either way. preemption does matter. it matters so much, that it makes the first thing your patch depends on to be questionable - if CPUA stuck in console_unlock() printing other CPU's messages, it's because there are currently printk()-s happening on CPUB-CPUZ. which is not always true. with preemption enabled CPUA can stuck in console_unlock() because printk()-s from CPUB-CPUZ could have happened seconds or even minutes ago. I have demonstrated it with the traces; it didn't convenience you. OK, I sat down and went trough Tetsuo's reports starting from 2016: https://marc.info/?l=linux-kernel=151375384500555 and Tetsuo has reported several times that printing CPU can sleep for minutes with console_sem being locked; while other CPUs are happy to printk()->log_store() as much as they want. and that's why I believe that that "The hand off should happen either way" is dangerous, especially when we hand off from a preemptible context to an atomic context. you don't think that this is a problem, because your patch requires logbuf to be small enough for any atomic context (irq, under spin_lock, etc) to be able to print out all the pending messages to all registered consoles [we print to consoles sequentially] within watchdog_threshold seconds (10 seconds by default). who does adjust the size of the logbuf in a way that console_unlock() can print the entire logbuf under 10 seconds? > > It's just that there's no one else to give that flushing duty too, so > > the pingpoinging that your patch implements can't really help > > anything. > > > > You argue that it isn't made worse by your patch, which may be true > > but your patch doesn't solve actual problems and is most likely > > unnecessary complication which gets in the way for the actual > > solution. It's a weird argument to push an approach which is > > fundamentally broken. Let's please stop that. > > BULLSHIT! > > It's not a complex solution, and coming from the cgroup and workqueue > maintainer, that's a pretty ironic comment. > > It has already been proven that it can solve problems: > > http://lkml.kernel.org/r/20171219143147.gb15...@dhcp22.suse.cz and Tetsuo also said that his test does not cover all the scenarios; and he has sent us all a pretty clear warning: : Therefore, while it is true that any approach would survive my : environment, it is dangerous to assume that any approach is safe : for my customer's enterprise servers. https://marc.info/?l=linux-kernel=151377161705573 when it comes to lockups, printk() design is so flexible, that one can justify nearly every patch no matter how many regressions it introduces, by simply saying: "but the same lockup scenario could have happened even before my patch. it's just printk() was designed this way. you were lucky enough not to hit the problem before; now you are less lucky". been there, done that. it's a trap. > You don't think handing off printks to an offloaded thread isn't more > complex nor can it cause even more issues (like more likely to lose > relevant information on kernel crashes)? printk_kthread *used* to be way too independent. basically what we had before was bad_function() printk() vprintk_emit() { if (oops_in_progress) can_offload_to_printk = false; if (can_offload_to_printk) wake_up(printk_kthread); else if (console_trylock()) console_unlock(); } the bad things: - first, we do not take into account the fact that printk_kthread can already be active. - second, we do not take into account the fact that printk_kthread can be way-way behind - a huge gap between `console_seq' and `log_next_seq'. - third, we do not take into account the fact that printk_kthread can be preempted under console_sem. so, IOW, the CPU which was in trouble declared the "oh, we should not offload to printk kthread" emergency only for itself, basically. the CPU which printk_kthread was running on or sleeping on [preemption] did not care, neither did printk_kthread. what we have now: - printk_kthread cannot be preempted under console_sem. if it acquires the console_sem it stays RUNNING, printing the messages. - printk_kthread does check for emergency on other CPUs. right
RE: [PATCH v2] ftrace/module: Move ftrace_release_mod to ddebug_cleanup label
Hello Steve, As per the discussion I sent patch v2. Please find. Thanks, Namit Gupta - Original Message - Sender : NAMIT GUPTA./Senior Chief Engineer/SRI-Delhi-Platform S/W 1 Team/Samsung Electronics Date : 2017-12-12 17:29 (GMT+5:30) Title : [PATCH v2] ftrace/module: Move ftrace_release_mod to ddebug_cleanup label ftrace_module_init happen after dynamic_debug_setup, it is desired that cleanup should be called after this label however in current implementation it is called in free module label,ie:even though ftrace in not initialized, from so many fail case ftrace_release_mod() will be called and unnecessary traverse the whole list. In below patch we moved ftrace_release_mod() from free_module label to ddebug_cleanup label. that is the best possible location, other solution is to make new label to ftrace_release_mod() but since ftrace_module_init() is not return with minimum changes it should be in ddebug_cleanup label. Signed-off-by: Namit Gupta --- kernel/module.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/kernel/module.c b/kernel/module.c index 0d1cb8d..4be966a 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -3523,6 +3523,7 @@ static int load_module(struct load_info *info, const char __user *uargs, unset_module_core_ro_nx(mod); ddebug_cleanup: +ftrace_release_mod(mod); dynamic_debug_remove(info->debug); synchronize_sched(); kfree(mod->args); @@ -3541,12 +3542,6 @@ static int load_module(struct load_info *info, const char __user *uargs, synchronize_rcu(); mutex_unlock(_mutex); free_module: -/* - * Ftrace needs to clean up what it initialized. - * This does nothing if ftrace_module_init() wasn't called, - * but it must be called outside of module_mutex. - */ -ftrace_release_mod(mod); /* Free lock-classes; relies on the preceding sync_rcu() */ lockdep_free_key_range(mod->module_core, mod->core_size); -- 1.9.1
RE: [PATCH v2] ftrace/module: Move ftrace_release_mod to ddebug_cleanup label
Hello Steve, As per the discussion I sent patch v2. Please find. Thanks, Namit Gupta - Original Message - Sender : NAMIT GUPTA ./Senior Chief Engineer/SRI-Delhi-Platform S/W 1 Team/Samsung Electronics Date : 2017-12-12 17:29 (GMT+5:30) Title : [PATCH v2] ftrace/module: Move ftrace_release_mod to ddebug_cleanup label ftrace_module_init happen after dynamic_debug_setup, it is desired that cleanup should be called after this label however in current implementation it is called in free module label,ie:even though ftrace in not initialized, from so many fail case ftrace_release_mod() will be called and unnecessary traverse the whole list. In below patch we moved ftrace_release_mod() from free_module label to ddebug_cleanup label. that is the best possible location, other solution is to make new label to ftrace_release_mod() but since ftrace_module_init() is not return with minimum changes it should be in ddebug_cleanup label. Signed-off-by: Namit Gupta --- kernel/module.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/kernel/module.c b/kernel/module.c index 0d1cb8d..4be966a 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -3523,6 +3523,7 @@ static int load_module(struct load_info *info, const char __user *uargs, unset_module_core_ro_nx(mod); ddebug_cleanup: +ftrace_release_mod(mod); dynamic_debug_remove(info->debug); synchronize_sched(); kfree(mod->args); @@ -3541,12 +3542,6 @@ static int load_module(struct load_info *info, const char __user *uargs, synchronize_rcu(); mutex_unlock(_mutex); free_module: -/* - * Ftrace needs to clean up what it initialized. - * This does nothing if ftrace_module_init() wasn't called, - * but it must be called outside of module_mutex. - */ -ftrace_release_mod(mod); /* Free lock-classes; relies on the preceding sync_rcu() */ lockdep_free_key_range(mod->module_core, mod->core_size); -- 1.9.1
[PATCH net-next v6 5/6] net: dccp: Add DCCP sendmsg trace event
Add DCCP sendmsg trace event (dccp/dccp_probe) for replacing dccpprobe. User can trace this event via ftrace or perftools. Signed-off-by: Masami Hiramatsu--- Changes in v5 - Fix to add local directory to include for trace.h. Thanks Steven! --- net/dccp/Makefile |3 ++ net/dccp/proto.c |5 +++ net/dccp/trace.h | 105 + 3 files changed, 113 insertions(+) create mode 100644 net/dccp/trace.h diff --git a/net/dccp/Makefile b/net/dccp/Makefile index 2e7b56097bc4..4215f13a63af 100644 --- a/net/dccp/Makefile +++ b/net/dccp/Makefile @@ -27,3 +27,6 @@ dccp-$(CONFIG_SYSCTL) += sysctl.o dccp_diag-y := diag.o dccp_probe-y := probe.o + +# build with local directory for trace.h +CFLAGS_proto.o := -I$(src) diff --git a/net/dccp/proto.c b/net/dccp/proto.c index 7a75a1d3568b..fa7e92e08920 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -38,6 +38,9 @@ #include "dccp.h" #include "feat.h" +#define CREATE_TRACE_POINTS +#include "trace.h" + DEFINE_SNMP_STAT(struct dccp_mib, dccp_statistics) __read_mostly; EXPORT_SYMBOL_GPL(dccp_statistics); @@ -761,6 +764,8 @@ int dccp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) int rc, size; long timeo; + trace_dccp_probe(sk, len); + if (len > dp->dccps_mss_cache) return -EMSGSIZE; diff --git a/net/dccp/trace.h b/net/dccp/trace.h new file mode 100644 index ..aa01321a6c37 --- /dev/null +++ b/net/dccp/trace.h @@ -0,0 +1,105 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM dccp + +#if !defined(_TRACE_DCCP_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_DCCP_H + +#include +#include "dccp.h" +#include "ccids/ccid3.h" +#include + +TRACE_EVENT(dccp_probe, + + TP_PROTO(struct sock *sk, size_t size), + + TP_ARGS(sk, size), + + TP_STRUCT__entry( + /* sockaddr_in6 is always bigger than sockaddr_in */ + __array(__u8, saddr, sizeof(struct sockaddr_in6)) + __array(__u8, daddr, sizeof(struct sockaddr_in6)) + __field(__u16, sport) + __field(__u16, dport) + __field(__u16, size) + __field(__u16, tx_s) + __field(__u32, tx_rtt) + __field(__u32, tx_p) + __field(__u32, tx_x_calc) + __field(__u64, tx_x_recv) + __field(__u64, tx_x) + __field(__u32, tx_t_ipi) + ), + + TP_fast_assign( + const struct inet_sock *inet = inet_sk(sk); + struct ccid3_hc_tx_sock *hc = NULL; + + if (ccid_get_current_tx_ccid(dccp_sk(sk)) == DCCPC_CCID3) + hc = ccid3_hc_tx_sk(sk); + + memset(__entry->saddr, 0, sizeof(struct sockaddr_in6)); + memset(__entry->daddr, 0, sizeof(struct sockaddr_in6)); + + if (sk->sk_family == AF_INET) { + struct sockaddr_in *v4 = (void *)__entry->saddr; + + v4->sin_family = AF_INET; + v4->sin_port = inet->inet_sport; + v4->sin_addr.s_addr = inet->inet_saddr; + v4 = (void *)__entry->daddr; + v4->sin_family = AF_INET; + v4->sin_port = inet->inet_dport; + v4->sin_addr.s_addr = inet->inet_daddr; +#if IS_ENABLED(CONFIG_IPV6) + } else if (sk->sk_family == AF_INET6) { + struct sockaddr_in6 *v6 = (void *)__entry->saddr; + + v6->sin6_family = AF_INET6; + v6->sin6_port = inet->inet_sport; + v6->sin6_addr = inet6_sk(sk)->saddr; + v6 = (void *)__entry->daddr; + v6->sin6_family = AF_INET6; + v6->sin6_port = inet->inet_dport; + v6->sin6_addr = sk->sk_v6_daddr; +#endif + } + + /* For filtering use */ + __entry->sport = ntohs(inet->inet_sport); + __entry->dport = ntohs(inet->inet_dport); + + __entry->size = size; + if (hc) { + __entry->tx_s = hc->tx_s; + __entry->tx_rtt = hc->tx_rtt; + __entry->tx_p = hc->tx_p; + __entry->tx_x_calc = hc->tx_x_calc; + __entry->tx_x_recv = hc->tx_x_recv >> 6; + __entry->tx_x = hc->tx_x >> 6; + __entry->tx_t_ipi = hc->tx_t_ipi; + } else { + __entry->tx_s = 0; + memset(&__entry->tx_rtt, 0, (void *)&__entry->tx_t_ipi - + (void *)&__entry->tx_rtt + + sizeof(__entry->tx_t_ipi)); + } + ), + + TP_printk("src=%pISpc dest=%pISpc
[PATCH net-next v6 4/6] net: sctp: Remove debug SCTP probe module
Remove SCTP probe module since jprobe has been deprecated. That function is now replaced by sctp/sctp_probe and sctp/sctp_probe_path trace-events. You can use it via ftrace or perftools. Signed-off-by: Masami Hiramatsu--- net/sctp/Kconfig | 12 --- net/sctp/Makefile |3 - net/sctp/probe.c | 244 - 3 files changed, 259 deletions(-) delete mode 100644 net/sctp/probe.c diff --git a/net/sctp/Kconfig b/net/sctp/Kconfig index d9c04dc1b3f3..c740b189d4ba 100644 --- a/net/sctp/Kconfig +++ b/net/sctp/Kconfig @@ -37,18 +37,6 @@ menuconfig IP_SCTP if IP_SCTP -config NET_SCTPPROBE - tristate "SCTP: Association probing" -depends on PROC_FS && KPROBES ----help--- -This module allows for capturing the changes to SCTP association -state in response to incoming packets. It is used for debugging -SCTP congestion control algorithms. If you don't understand -what was just said, you don't need it: say N. - -To compile this code as a module, choose M here: the -module will be called sctp_probe. - config SCTP_DBG_OBJCNT bool "SCTP: Debug object counts" depends on PROC_FS diff --git a/net/sctp/Makefile b/net/sctp/Makefile index 54bd9c1a8aa1..6776582ec449 100644 --- a/net/sctp/Makefile +++ b/net/sctp/Makefile @@ -4,7 +4,6 @@ # obj-$(CONFIG_IP_SCTP) += sctp.o -obj-$(CONFIG_NET_SCTPPROBE) += sctp_probe.o obj-$(CONFIG_INET_SCTP_DIAG) += sctp_diag.o sctp-y := sm_statetable.o sm_statefuns.o sm_sideeffect.o \ @@ -16,8 +15,6 @@ sctp-y := sm_statetable.o sm_statefuns.o sm_sideeffect.o \ offload.o stream_sched.o stream_sched_prio.o \ stream_sched_rr.o stream_interleave.o -sctp_probe-y := probe.o - sctp-$(CONFIG_SCTP_DBG_OBJCNT) += objcnt.o sctp-$(CONFIG_PROC_FS) += proc.o sctp-$(CONFIG_SYSCTL) += sysctl.o diff --git a/net/sctp/probe.c b/net/sctp/probe.c deleted file mode 100644 index 1280f85a598d.. --- a/net/sctp/probe.c +++ /dev/null @@ -1,244 +0,0 @@ -/* - * sctp_probe - Observe the SCTP flow with kprobes. - * - * The idea for this came from Werner Almesberger's umlsim - * Copyright (C) 2004, Stephen Hemminger - * - * Modified for SCTP from Stephen Hemminger's code - * Copyright (C) 2010, Wei Yongjun - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. - */ - -#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include -#include - -MODULE_SOFTDEP("pre: sctp"); -MODULE_AUTHOR("Wei Yongjun "); -MODULE_DESCRIPTION("SCTP snooper"); -MODULE_LICENSE("GPL"); - -static int port __read_mostly = 0; -MODULE_PARM_DESC(port, "Port to match (0=all)"); -module_param(port, int, 0); - -static unsigned int fwmark __read_mostly = 0; -MODULE_PARM_DESC(fwmark, "skb mark to match (0=no mark)"); -module_param(fwmark, uint, 0); - -static int bufsize __read_mostly = 64 * 1024; -MODULE_PARM_DESC(bufsize, "Log buffer size (default 64k)"); -module_param(bufsize, int, 0); - -static int full __read_mostly = 1; -MODULE_PARM_DESC(full, "Full log (1=every ack packet received, 0=only cwnd changes)"); -module_param(full, int, 0); - -static const char procname[] = "sctpprobe"; - -static struct { - struct kfifo fifo; - spinlock_tlock; - wait_queue_head_t wait; - struct timespec64 tstart; -} sctpw; - -static __printf(1, 2) void printl(const char *fmt, ...) -{ - va_list args; - int len; - char tbuf[256]; - - va_start(args, fmt); - len = vscnprintf(tbuf, sizeof(tbuf), fmt, args); - va_end(args); - - kfifo_in_locked(, tbuf, len, ); - wake_up(); -} - -static int sctpprobe_open(struct inode *inode, struct file *file) -{ - kfifo_reset(); - ktime_get_ts64(); - - return 0; -} - -static ssize_t sctpprobe_read(struct file *file, char __user *buf, - size_t len, loff_t *ppos) -{ - int error = 0, cnt = 0; - unsigned char *tbuf; - - if (!buf) - return -EINVAL; - - if (len == 0) - return 0; - - tbuf = vmalloc(len); - if (!tbuf) - return
[PATCH net-next v6 1/6] net: tcp: Add trace events for TCP congestion window tracing
This adds an event to trace TCP stat variables with slightly intrusive trace-event. This uses ftrace/perf event log buffer to trace those state, no needs to prepare own ring-buffer, nor custom user apps. User can use ftrace to trace this event as below; # cd /sys/kernel/debug/tracing # echo 1 > events/tcp/tcp_probe/enable (run workloads) # cat trace Signed-off-by: Masami Hiramatsu--- Changes in v6: - Avoid preprocessor directives in tracepoint macro args as Mat did on net tree. --- include/trace/events/tcp.h | 97 net/ipv4/tcp_input.c |3 + 2 files changed, 100 insertions(+) diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h index 8e88a1671538..4dea6342f7d4 100644 --- a/include/trace/events/tcp.h +++ b/include/trace/events/tcp.h @@ -1,3 +1,4 @@ +/* SPDX-License-Identifier: GPL-2.0 */ #undef TRACE_SYSTEM #define TRACE_SYSTEM tcp @@ -8,6 +9,7 @@ #include #include #include +#include /* * tcp event with arguments sk and skb @@ -277,6 +279,101 @@ TRACE_EVENT(tcp_retransmit_synack, __entry->saddr_v6, __entry->daddr_v6) ); + +#define TP_STORE_ADDR_PORTS_V4(__entry, inet, sk) \ + do {\ + struct sockaddr_in *v4 = (void *)__entry->saddr;\ + \ + v4->sin_family = AF_INET; \ + v4->sin_port = inet->inet_sport;\ + v4->sin_addr.s_addr = inet->inet_saddr; \ + v4 = (void *)__entry->daddr;\ + v4->sin_family = AF_INET; \ + v4->sin_port = inet->inet_dport;\ + v4->sin_addr.s_addr = inet->inet_daddr; \ + } while (0) + +#if IS_ENABLED(CONFIG_IPV6) + +#define TP_STORE_ADDR_PORTS(__entry, inet, sk) \ + do {\ + if (sk->sk_family == AF_INET6) {\ + struct sockaddr_in6 *v6 = (void *)__entry->saddr; \ + \ + v6->sin6_family = AF_INET6; \ + v6->sin6_port = inet->inet_sport; \ + v6->sin6_addr = inet6_sk(sk)->saddr;\ + v6 = (void *)__entry->daddr;\ + v6->sin6_family = AF_INET6; \ + v6->sin6_port = inet->inet_dport; \ + v6->sin6_addr = sk->sk_v6_daddr;\ + } else \ + TP_STORE_ADDR_PORTS_V4(__entry, inet, sk); \ + } while (0) + +#else + +#define TP_STORE_ADDR_PORTS(__entry, inet, sk) \ + TP_STORE_ADDR_PORTS_V4(__entry, inet, sk); + +#endif + +TRACE_EVENT(tcp_probe, + + TP_PROTO(struct sock *sk, struct sk_buff *skb), + + TP_ARGS(sk, skb), + + TP_STRUCT__entry( + /* sockaddr_in6 is always bigger than sockaddr_in */ + __array(__u8, saddr, sizeof(struct sockaddr_in6)) + __array(__u8, daddr, sizeof(struct sockaddr_in6)) + __field(__u16, sport) + __field(__u16, dport) + __field(__u32, mark) + __field(__u16, length) + __field(__u32, snd_nxt) + __field(__u32, snd_una) + __field(__u32, snd_cwnd) + __field(__u32, ssthresh) + __field(__u32, snd_wnd) + __field(__u32, srtt) + __field(__u32, rcv_wnd) + ), + + TP_fast_assign( + const struct tcp_sock *tp = tcp_sk(sk); + const struct inet_sock *inet = inet_sk(sk); + + memset(__entry->saddr, 0, sizeof(struct sockaddr_in6)); + memset(__entry->daddr, 0, sizeof(struct sockaddr_in6)); + + TP_STORE_ADDR_PORTS(__entry, inet, sk); + + /* For filtering use */ + __entry->sport = ntohs(inet->inet_sport); + __entry->dport = ntohs(inet->inet_dport); + __entry->mark = skb->mark; + + __entry->length = skb->len; + __entry->snd_nxt = tp->snd_nxt; + __entry->snd_una = tp->snd_una; + __entry->snd_cwnd = tp->snd_cwnd; + __entry->snd_wnd = tp->snd_wnd; + __entry->rcv_wnd = tp->rcv_wnd; + __entry->ssthresh = tcp_current_ssthresh(sk); + __entry->srtt = tp->srtt_us >> 3; + ), + +
[PATCH net-next v6 5/6] net: dccp: Add DCCP sendmsg trace event
Add DCCP sendmsg trace event (dccp/dccp_probe) for replacing dccpprobe. User can trace this event via ftrace or perftools. Signed-off-by: Masami Hiramatsu --- Changes in v5 - Fix to add local directory to include for trace.h. Thanks Steven! --- net/dccp/Makefile |3 ++ net/dccp/proto.c |5 +++ net/dccp/trace.h | 105 + 3 files changed, 113 insertions(+) create mode 100644 net/dccp/trace.h diff --git a/net/dccp/Makefile b/net/dccp/Makefile index 2e7b56097bc4..4215f13a63af 100644 --- a/net/dccp/Makefile +++ b/net/dccp/Makefile @@ -27,3 +27,6 @@ dccp-$(CONFIG_SYSCTL) += sysctl.o dccp_diag-y := diag.o dccp_probe-y := probe.o + +# build with local directory for trace.h +CFLAGS_proto.o := -I$(src) diff --git a/net/dccp/proto.c b/net/dccp/proto.c index 7a75a1d3568b..fa7e92e08920 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -38,6 +38,9 @@ #include "dccp.h" #include "feat.h" +#define CREATE_TRACE_POINTS +#include "trace.h" + DEFINE_SNMP_STAT(struct dccp_mib, dccp_statistics) __read_mostly; EXPORT_SYMBOL_GPL(dccp_statistics); @@ -761,6 +764,8 @@ int dccp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) int rc, size; long timeo; + trace_dccp_probe(sk, len); + if (len > dp->dccps_mss_cache) return -EMSGSIZE; diff --git a/net/dccp/trace.h b/net/dccp/trace.h new file mode 100644 index ..aa01321a6c37 --- /dev/null +++ b/net/dccp/trace.h @@ -0,0 +1,105 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM dccp + +#if !defined(_TRACE_DCCP_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_DCCP_H + +#include +#include "dccp.h" +#include "ccids/ccid3.h" +#include + +TRACE_EVENT(dccp_probe, + + TP_PROTO(struct sock *sk, size_t size), + + TP_ARGS(sk, size), + + TP_STRUCT__entry( + /* sockaddr_in6 is always bigger than sockaddr_in */ + __array(__u8, saddr, sizeof(struct sockaddr_in6)) + __array(__u8, daddr, sizeof(struct sockaddr_in6)) + __field(__u16, sport) + __field(__u16, dport) + __field(__u16, size) + __field(__u16, tx_s) + __field(__u32, tx_rtt) + __field(__u32, tx_p) + __field(__u32, tx_x_calc) + __field(__u64, tx_x_recv) + __field(__u64, tx_x) + __field(__u32, tx_t_ipi) + ), + + TP_fast_assign( + const struct inet_sock *inet = inet_sk(sk); + struct ccid3_hc_tx_sock *hc = NULL; + + if (ccid_get_current_tx_ccid(dccp_sk(sk)) == DCCPC_CCID3) + hc = ccid3_hc_tx_sk(sk); + + memset(__entry->saddr, 0, sizeof(struct sockaddr_in6)); + memset(__entry->daddr, 0, sizeof(struct sockaddr_in6)); + + if (sk->sk_family == AF_INET) { + struct sockaddr_in *v4 = (void *)__entry->saddr; + + v4->sin_family = AF_INET; + v4->sin_port = inet->inet_sport; + v4->sin_addr.s_addr = inet->inet_saddr; + v4 = (void *)__entry->daddr; + v4->sin_family = AF_INET; + v4->sin_port = inet->inet_dport; + v4->sin_addr.s_addr = inet->inet_daddr; +#if IS_ENABLED(CONFIG_IPV6) + } else if (sk->sk_family == AF_INET6) { + struct sockaddr_in6 *v6 = (void *)__entry->saddr; + + v6->sin6_family = AF_INET6; + v6->sin6_port = inet->inet_sport; + v6->sin6_addr = inet6_sk(sk)->saddr; + v6 = (void *)__entry->daddr; + v6->sin6_family = AF_INET6; + v6->sin6_port = inet->inet_dport; + v6->sin6_addr = sk->sk_v6_daddr; +#endif + } + + /* For filtering use */ + __entry->sport = ntohs(inet->inet_sport); + __entry->dport = ntohs(inet->inet_dport); + + __entry->size = size; + if (hc) { + __entry->tx_s = hc->tx_s; + __entry->tx_rtt = hc->tx_rtt; + __entry->tx_p = hc->tx_p; + __entry->tx_x_calc = hc->tx_x_calc; + __entry->tx_x_recv = hc->tx_x_recv >> 6; + __entry->tx_x = hc->tx_x >> 6; + __entry->tx_t_ipi = hc->tx_t_ipi; + } else { + __entry->tx_s = 0; + memset(&__entry->tx_rtt, 0, (void *)&__entry->tx_t_ipi - + (void *)&__entry->tx_rtt + + sizeof(__entry->tx_t_ipi)); + } + ), + + TP_printk("src=%pISpc dest=%pISpc size=%d tx_s=%d
[PATCH net-next v6 4/6] net: sctp: Remove debug SCTP probe module
Remove SCTP probe module since jprobe has been deprecated. That function is now replaced by sctp/sctp_probe and sctp/sctp_probe_path trace-events. You can use it via ftrace or perftools. Signed-off-by: Masami Hiramatsu --- net/sctp/Kconfig | 12 --- net/sctp/Makefile |3 - net/sctp/probe.c | 244 - 3 files changed, 259 deletions(-) delete mode 100644 net/sctp/probe.c diff --git a/net/sctp/Kconfig b/net/sctp/Kconfig index d9c04dc1b3f3..c740b189d4ba 100644 --- a/net/sctp/Kconfig +++ b/net/sctp/Kconfig @@ -37,18 +37,6 @@ menuconfig IP_SCTP if IP_SCTP -config NET_SCTPPROBE - tristate "SCTP: Association probing" -depends on PROC_FS && KPROBES ----help--- -This module allows for capturing the changes to SCTP association -state in response to incoming packets. It is used for debugging -SCTP congestion control algorithms. If you don't understand -what was just said, you don't need it: say N. - -To compile this code as a module, choose M here: the -module will be called sctp_probe. - config SCTP_DBG_OBJCNT bool "SCTP: Debug object counts" depends on PROC_FS diff --git a/net/sctp/Makefile b/net/sctp/Makefile index 54bd9c1a8aa1..6776582ec449 100644 --- a/net/sctp/Makefile +++ b/net/sctp/Makefile @@ -4,7 +4,6 @@ # obj-$(CONFIG_IP_SCTP) += sctp.o -obj-$(CONFIG_NET_SCTPPROBE) += sctp_probe.o obj-$(CONFIG_INET_SCTP_DIAG) += sctp_diag.o sctp-y := sm_statetable.o sm_statefuns.o sm_sideeffect.o \ @@ -16,8 +15,6 @@ sctp-y := sm_statetable.o sm_statefuns.o sm_sideeffect.o \ offload.o stream_sched.o stream_sched_prio.o \ stream_sched_rr.o stream_interleave.o -sctp_probe-y := probe.o - sctp-$(CONFIG_SCTP_DBG_OBJCNT) += objcnt.o sctp-$(CONFIG_PROC_FS) += proc.o sctp-$(CONFIG_SYSCTL) += sysctl.o diff --git a/net/sctp/probe.c b/net/sctp/probe.c deleted file mode 100644 index 1280f85a598d.. --- a/net/sctp/probe.c +++ /dev/null @@ -1,244 +0,0 @@ -/* - * sctp_probe - Observe the SCTP flow with kprobes. - * - * The idea for this came from Werner Almesberger's umlsim - * Copyright (C) 2004, Stephen Hemminger - * - * Modified for SCTP from Stephen Hemminger's code - * Copyright (C) 2010, Wei Yongjun - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. - */ - -#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include -#include - -MODULE_SOFTDEP("pre: sctp"); -MODULE_AUTHOR("Wei Yongjun "); -MODULE_DESCRIPTION("SCTP snooper"); -MODULE_LICENSE("GPL"); - -static int port __read_mostly = 0; -MODULE_PARM_DESC(port, "Port to match (0=all)"); -module_param(port, int, 0); - -static unsigned int fwmark __read_mostly = 0; -MODULE_PARM_DESC(fwmark, "skb mark to match (0=no mark)"); -module_param(fwmark, uint, 0); - -static int bufsize __read_mostly = 64 * 1024; -MODULE_PARM_DESC(bufsize, "Log buffer size (default 64k)"); -module_param(bufsize, int, 0); - -static int full __read_mostly = 1; -MODULE_PARM_DESC(full, "Full log (1=every ack packet received, 0=only cwnd changes)"); -module_param(full, int, 0); - -static const char procname[] = "sctpprobe"; - -static struct { - struct kfifo fifo; - spinlock_tlock; - wait_queue_head_t wait; - struct timespec64 tstart; -} sctpw; - -static __printf(1, 2) void printl(const char *fmt, ...) -{ - va_list args; - int len; - char tbuf[256]; - - va_start(args, fmt); - len = vscnprintf(tbuf, sizeof(tbuf), fmt, args); - va_end(args); - - kfifo_in_locked(, tbuf, len, ); - wake_up(); -} - -static int sctpprobe_open(struct inode *inode, struct file *file) -{ - kfifo_reset(); - ktime_get_ts64(); - - return 0; -} - -static ssize_t sctpprobe_read(struct file *file, char __user *buf, - size_t len, loff_t *ppos) -{ - int error = 0, cnt = 0; - unsigned char *tbuf; - - if (!buf) - return -EINVAL; - - if (len == 0) - return 0; - - tbuf = vmalloc(len); - if (!tbuf) - return -ENOMEM; - - error = wait_event_interruptible(sctpw.wait, -
[PATCH net-next v6 1/6] net: tcp: Add trace events for TCP congestion window tracing
This adds an event to trace TCP stat variables with slightly intrusive trace-event. This uses ftrace/perf event log buffer to trace those state, no needs to prepare own ring-buffer, nor custom user apps. User can use ftrace to trace this event as below; # cd /sys/kernel/debug/tracing # echo 1 > events/tcp/tcp_probe/enable (run workloads) # cat trace Signed-off-by: Masami Hiramatsu --- Changes in v6: - Avoid preprocessor directives in tracepoint macro args as Mat did on net tree. --- include/trace/events/tcp.h | 97 net/ipv4/tcp_input.c |3 + 2 files changed, 100 insertions(+) diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h index 8e88a1671538..4dea6342f7d4 100644 --- a/include/trace/events/tcp.h +++ b/include/trace/events/tcp.h @@ -1,3 +1,4 @@ +/* SPDX-License-Identifier: GPL-2.0 */ #undef TRACE_SYSTEM #define TRACE_SYSTEM tcp @@ -8,6 +9,7 @@ #include #include #include +#include /* * tcp event with arguments sk and skb @@ -277,6 +279,101 @@ TRACE_EVENT(tcp_retransmit_synack, __entry->saddr_v6, __entry->daddr_v6) ); + +#define TP_STORE_ADDR_PORTS_V4(__entry, inet, sk) \ + do {\ + struct sockaddr_in *v4 = (void *)__entry->saddr;\ + \ + v4->sin_family = AF_INET; \ + v4->sin_port = inet->inet_sport;\ + v4->sin_addr.s_addr = inet->inet_saddr; \ + v4 = (void *)__entry->daddr;\ + v4->sin_family = AF_INET; \ + v4->sin_port = inet->inet_dport;\ + v4->sin_addr.s_addr = inet->inet_daddr; \ + } while (0) + +#if IS_ENABLED(CONFIG_IPV6) + +#define TP_STORE_ADDR_PORTS(__entry, inet, sk) \ + do {\ + if (sk->sk_family == AF_INET6) {\ + struct sockaddr_in6 *v6 = (void *)__entry->saddr; \ + \ + v6->sin6_family = AF_INET6; \ + v6->sin6_port = inet->inet_sport; \ + v6->sin6_addr = inet6_sk(sk)->saddr;\ + v6 = (void *)__entry->daddr;\ + v6->sin6_family = AF_INET6; \ + v6->sin6_port = inet->inet_dport; \ + v6->sin6_addr = sk->sk_v6_daddr;\ + } else \ + TP_STORE_ADDR_PORTS_V4(__entry, inet, sk); \ + } while (0) + +#else + +#define TP_STORE_ADDR_PORTS(__entry, inet, sk) \ + TP_STORE_ADDR_PORTS_V4(__entry, inet, sk); + +#endif + +TRACE_EVENT(tcp_probe, + + TP_PROTO(struct sock *sk, struct sk_buff *skb), + + TP_ARGS(sk, skb), + + TP_STRUCT__entry( + /* sockaddr_in6 is always bigger than sockaddr_in */ + __array(__u8, saddr, sizeof(struct sockaddr_in6)) + __array(__u8, daddr, sizeof(struct sockaddr_in6)) + __field(__u16, sport) + __field(__u16, dport) + __field(__u32, mark) + __field(__u16, length) + __field(__u32, snd_nxt) + __field(__u32, snd_una) + __field(__u32, snd_cwnd) + __field(__u32, ssthresh) + __field(__u32, snd_wnd) + __field(__u32, srtt) + __field(__u32, rcv_wnd) + ), + + TP_fast_assign( + const struct tcp_sock *tp = tcp_sk(sk); + const struct inet_sock *inet = inet_sk(sk); + + memset(__entry->saddr, 0, sizeof(struct sockaddr_in6)); + memset(__entry->daddr, 0, sizeof(struct sockaddr_in6)); + + TP_STORE_ADDR_PORTS(__entry, inet, sk); + + /* For filtering use */ + __entry->sport = ntohs(inet->inet_sport); + __entry->dport = ntohs(inet->inet_dport); + __entry->mark = skb->mark; + + __entry->length = skb->len; + __entry->snd_nxt = tp->snd_nxt; + __entry->snd_una = tp->snd_una; + __entry->snd_cwnd = tp->snd_cwnd; + __entry->snd_wnd = tp->snd_wnd; + __entry->rcv_wnd = tp->rcv_wnd; + __entry->ssthresh = tcp_current_ssthresh(sk); + __entry->srtt = tp->srtt_us >> 3; + ), + + TP_printk("src=%pISpc
Re: [Ocfs2-devel] [PATCH] ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGE
On 2017/12/28 10:48, Gang He wrote: > Hi Alex, > > >> Hi Gang, >> >> On 2017/12/27 18:37, Gang He wrote: >>> Hi Jun, >>> >>> >> Hi Gang, Do you mean that too many retrys in loop cast losts of CPU-time and block page-fault interrupt? We should not add any delay in ocfs2_fault(), right? And I still feel a little confused why your method can solve this problem. >>> You can see the related code in function filemap_fault(), if ocfs2 fails to >> read a page since >>> it can not get a inode lock with non-block mode, the VFS layer code will >> invoke ocfs2 >>> read page call back function circularly, this will lead to a softlockup >> problem (like the below back trace). >>> So, we should get a blocking lock to let the dlm lock to this node and also >> can avoid CPU loop, >> Can we use 'cond_resched()' to allow the thread to release the CPU >> temperately for solving this softlockup? > Yes, we can use cond_resched() function to avoid this softlockup. > In fact, if the kernel is configured with CONFIG_PREEMPT=y, this softlockup > does not happen since the kernel can help. > But, this way still leads to CPU resource waste, CPU usage can reach about > 80% - 100% when > multiple nodes read/write/mmap-access the same file concurrently, and more, > the read/write/mmap-access > speed is more lower (50% decrease). > Why? > Because we need to get DLM lock for each node, before one node gets DLM lock, > another node has > to down-convert this DLM lock, that means flushing the memory data to the > disk before DLM lock down-conversion. > this disk IO operation is very slow compared with CPU cycle, that means the > node which want to get DLM lock, > will do lots of reties before another node complete down-converting this DLM > lock, actual, these retries do not make > sense, just waste CPU cycle. > So, if we add a blocking lock/unlock here, we will avoid these unnecessary > reties, especially in case slow-speed disk and more ocfs2 nodes(>=3). > I did the ocfs2 test case (multi_mmap in multiple_run.sh), after applied this > patch, the CPU rate on each node was about 40%-50%, and the test case > execution time reduced by half. > the full command is as below, > multiple_run.sh -i eth0 -k ~/linux-4.4.21-69.tar.gz -o ~/ocfs2mullog -C > hacluster -s pcmk -n nd1,nd2,nd3 -d /dev/sda1 -b 4096 -c 32768 -t multi_mmap > /mnt/shared > the shared storage is a iscsi disk. > OK, I think it is more better if you can add you test method and result in change log. Thanks, Alex > Thanks > Gang > >> >>> second, base on my testing, the patch also can improve the efficiency in >> case modifying the same >>> file frequently from multiple nodes, since the lock acquisition chance is >> more fair. >>> In fact, the code was modified by a patch 1cce4df04f37 ("ocfs2: do not >> lock/unlock() inode DLM lock"), >>> before that patch, the code is the same, this patch can be considered to >> revert that patch, except adding more >>> clear comments. >> In patch 1cce4df04f37 ("ocfs2: do not lock/unlock() inode DLM lock"), >> Goldwyn says blocking lock and unlock will only make >> the performance worse where contention over the locks is high, which is the >> opposite of your described above. >> IMO, blocking lock and unlock here is indeed unnecessary. >> >> Thanks, >> Alex >>> >>> Thanks >>> Gang >>> >>> thanks, Jun On 2017/12/27 17:29, Gang He wrote: > If we can't get inode lock immediately in the function > ocfs2_inode_lock_with_page() when reading a page, we should not > return directly here, since this will lead to a softlockup problem. > The method is to get a blocking lock and immediately unlock before > returning, this can avoid CPU resource waste due to lots of retries, > and benefits fairness in getting lock among multiple nodes, increase > efficiency in case modifying the same file frequently from multiple > nodes. > The softlockup problem looks like, > Kernel panic - not syncing: softlockup: hung tasks > CPU: 0 PID: 885 Comm: multi_mmap Tainted: G L 4.12.14-6.1-default #1 > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > Call Trace: > > dump_stack+0x5c/0x82 > panic+0xd5/0x21e > watchdog_timer_fn+0x208/0x210 > ? watchdog_park_threads+0x70/0x70 > __hrtimer_run_queues+0xcc/0x200 > hrtimer_interrupt+0xa6/0x1f0 > smp_apic_timer_interrupt+0x34/0x50 > apic_timer_interrupt+0x96/0xa0 > > RIP: 0010:unlock_page+0x17/0x30 > RSP: :af154080bc88 EFLAGS: 0246 ORIG_RAX: ff10 > RAX: dead0100 RBX: f21e009f5300 RCX: 0004 > RDX: dead00ff RSI: 0202 RDI: f21e009f5300 > RBP: R08: R09: af154080bb00 > R10: af154080bc30 R11: 0040 R12: 993749a39518 > R13: R14: f21e009f5300 R15:
[PATCH net-next v6 2/6] net: tcp: Remove TCP probe module
Remove TCP probe module since jprobe has been deprecated. That function is now replaced by tcp/tcp_probe trace-event. You can use it via ftrace or perftools. Signed-off-by: Masami Hiramatsu--- net/Kconfig | 17 --- net/ipv4/Makefile|1 net/ipv4/tcp_probe.c | 301 -- 3 files changed, 319 deletions(-) delete mode 100644 net/ipv4/tcp_probe.c diff --git a/net/Kconfig b/net/Kconfig index 9dba2715919d..efe930db3c08 100644 --- a/net/Kconfig +++ b/net/Kconfig @@ -336,23 +336,6 @@ config NET_PKTGEN To compile this code as a module, choose M here: the module will be called pktgen. -config NET_TCPPROBE - tristate "TCP connection probing" - depends on INET && PROC_FS && KPROBES - ---help--- - This module allows for capturing the changes to TCP connection - state in response to incoming packets. It is used for debugging - TCP congestion avoidance modules. If you don't understand - what was just said, you don't need it: say N. - - Documentation on how to use TCP connection probing can be found - at: - - http://www.linuxfoundation.org/collaborate/workgroups/networking/tcpprobe - - To compile this code as a module, choose M here: the - module will be called tcp_probe. - config NET_DROP_MONITOR tristate "Network packet drop alerting service" depends on INET && TRACEPOINTS diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile index c6c8ad1d4b6d..47a0a6649a9d 100644 --- a/net/ipv4/Makefile +++ b/net/ipv4/Makefile @@ -43,7 +43,6 @@ obj-$(CONFIG_INET_DIAG) += inet_diag.o obj-$(CONFIG_INET_TCP_DIAG) += tcp_diag.o obj-$(CONFIG_INET_UDP_DIAG) += udp_diag.o obj-$(CONFIG_INET_RAW_DIAG) += raw_diag.o -obj-$(CONFIG_NET_TCPPROBE) += tcp_probe.o obj-$(CONFIG_TCP_CONG_BBR) += tcp_bbr.o obj-$(CONFIG_TCP_CONG_BIC) += tcp_bic.o obj-$(CONFIG_TCP_CONG_CDG) += tcp_cdg.o diff --git a/net/ipv4/tcp_probe.c b/net/ipv4/tcp_probe.c deleted file mode 100644 index 697f4c67b2e3.. --- a/net/ipv4/tcp_probe.c +++ /dev/null @@ -1,301 +0,0 @@ -/* - * tcpprobe - Observe the TCP flow with kprobes. - * - * The idea for this came from Werner Almesberger's umlsim - * Copyright (C) 2004, Stephen Hemminger - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. - */ - -#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include - -MODULE_AUTHOR("Stephen Hemminger "); -MODULE_DESCRIPTION("TCP cwnd snooper"); -MODULE_LICENSE("GPL"); -MODULE_VERSION("1.1"); - -static int port __read_mostly; -MODULE_PARM_DESC(port, "Port to match (0=all)"); -module_param(port, int, 0); - -static unsigned int bufsize __read_mostly = 4096; -MODULE_PARM_DESC(bufsize, "Log buffer size in packets (4096)"); -module_param(bufsize, uint, 0); - -static unsigned int fwmark __read_mostly; -MODULE_PARM_DESC(fwmark, "skb mark to match (0=no mark)"); -module_param(fwmark, uint, 0); - -static int full __read_mostly; -MODULE_PARM_DESC(full, "Full log (1=every ack packet received, 0=only cwnd changes)"); -module_param(full, int, 0); - -static const char procname[] = "tcpprobe"; - -struct tcp_log { - ktime_t tstamp; - union { - struct sockaddr raw; - struct sockaddr_in v4; - struct sockaddr_in6 v6; - } src, dst; - u16 length; - u32 snd_nxt; - u32 snd_una; - u32 snd_wnd; - u32 rcv_wnd; - u32 snd_cwnd; - u32 ssthresh; - u32 srtt; -}; - -static struct { - spinlock_t lock; - wait_queue_head_t wait; - ktime_t start; - u32 lastcwnd; - - unsigned long head, tail; - struct tcp_log *log; -} tcp_probe; - -static inline int tcp_probe_used(void) -{ - return (tcp_probe.head - tcp_probe.tail) & (bufsize - 1); -} - -static inline int tcp_probe_avail(void) -{ - return bufsize - tcp_probe_used() - 1; -} - -#define tcp_probe_copy_fl_to_si4(inet, si4, mem) \ - do {\ - si4.sin_family = AF_INET;
[PATCH net-next v6 2/6] net: tcp: Remove TCP probe module
Remove TCP probe module since jprobe has been deprecated. That function is now replaced by tcp/tcp_probe trace-event. You can use it via ftrace or perftools. Signed-off-by: Masami Hiramatsu --- net/Kconfig | 17 --- net/ipv4/Makefile|1 net/ipv4/tcp_probe.c | 301 -- 3 files changed, 319 deletions(-) delete mode 100644 net/ipv4/tcp_probe.c diff --git a/net/Kconfig b/net/Kconfig index 9dba2715919d..efe930db3c08 100644 --- a/net/Kconfig +++ b/net/Kconfig @@ -336,23 +336,6 @@ config NET_PKTGEN To compile this code as a module, choose M here: the module will be called pktgen. -config NET_TCPPROBE - tristate "TCP connection probing" - depends on INET && PROC_FS && KPROBES - ---help--- - This module allows for capturing the changes to TCP connection - state in response to incoming packets. It is used for debugging - TCP congestion avoidance modules. If you don't understand - what was just said, you don't need it: say N. - - Documentation on how to use TCP connection probing can be found - at: - - http://www.linuxfoundation.org/collaborate/workgroups/networking/tcpprobe - - To compile this code as a module, choose M here: the - module will be called tcp_probe. - config NET_DROP_MONITOR tristate "Network packet drop alerting service" depends on INET && TRACEPOINTS diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile index c6c8ad1d4b6d..47a0a6649a9d 100644 --- a/net/ipv4/Makefile +++ b/net/ipv4/Makefile @@ -43,7 +43,6 @@ obj-$(CONFIG_INET_DIAG) += inet_diag.o obj-$(CONFIG_INET_TCP_DIAG) += tcp_diag.o obj-$(CONFIG_INET_UDP_DIAG) += udp_diag.o obj-$(CONFIG_INET_RAW_DIAG) += raw_diag.o -obj-$(CONFIG_NET_TCPPROBE) += tcp_probe.o obj-$(CONFIG_TCP_CONG_BBR) += tcp_bbr.o obj-$(CONFIG_TCP_CONG_BIC) += tcp_bic.o obj-$(CONFIG_TCP_CONG_CDG) += tcp_cdg.o diff --git a/net/ipv4/tcp_probe.c b/net/ipv4/tcp_probe.c deleted file mode 100644 index 697f4c67b2e3.. --- a/net/ipv4/tcp_probe.c +++ /dev/null @@ -1,301 +0,0 @@ -/* - * tcpprobe - Observe the TCP flow with kprobes. - * - * The idea for this came from Werner Almesberger's umlsim - * Copyright (C) 2004, Stephen Hemminger - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. - */ - -#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include - -MODULE_AUTHOR("Stephen Hemminger "); -MODULE_DESCRIPTION("TCP cwnd snooper"); -MODULE_LICENSE("GPL"); -MODULE_VERSION("1.1"); - -static int port __read_mostly; -MODULE_PARM_DESC(port, "Port to match (0=all)"); -module_param(port, int, 0); - -static unsigned int bufsize __read_mostly = 4096; -MODULE_PARM_DESC(bufsize, "Log buffer size in packets (4096)"); -module_param(bufsize, uint, 0); - -static unsigned int fwmark __read_mostly; -MODULE_PARM_DESC(fwmark, "skb mark to match (0=no mark)"); -module_param(fwmark, uint, 0); - -static int full __read_mostly; -MODULE_PARM_DESC(full, "Full log (1=every ack packet received, 0=only cwnd changes)"); -module_param(full, int, 0); - -static const char procname[] = "tcpprobe"; - -struct tcp_log { - ktime_t tstamp; - union { - struct sockaddr raw; - struct sockaddr_in v4; - struct sockaddr_in6 v6; - } src, dst; - u16 length; - u32 snd_nxt; - u32 snd_una; - u32 snd_wnd; - u32 rcv_wnd; - u32 snd_cwnd; - u32 ssthresh; - u32 srtt; -}; - -static struct { - spinlock_t lock; - wait_queue_head_t wait; - ktime_t start; - u32 lastcwnd; - - unsigned long head, tail; - struct tcp_log *log; -} tcp_probe; - -static inline int tcp_probe_used(void) -{ - return (tcp_probe.head - tcp_probe.tail) & (bufsize - 1); -} - -static inline int tcp_probe_avail(void) -{ - return bufsize - tcp_probe_used() - 1; -} - -#define tcp_probe_copy_fl_to_si4(inet, si4, mem) \ - do {\ - si4.sin_family = AF_INET; \ - si4.sin_port = inet->inet_##mem##port;
Re: [Ocfs2-devel] [PATCH] ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGE
On 2017/12/28 10:48, Gang He wrote: > Hi Alex, > > >> Hi Gang, >> >> On 2017/12/27 18:37, Gang He wrote: >>> Hi Jun, >>> >>> >> Hi Gang, Do you mean that too many retrys in loop cast losts of CPU-time and block page-fault interrupt? We should not add any delay in ocfs2_fault(), right? And I still feel a little confused why your method can solve this problem. >>> You can see the related code in function filemap_fault(), if ocfs2 fails to >> read a page since >>> it can not get a inode lock with non-block mode, the VFS layer code will >> invoke ocfs2 >>> read page call back function circularly, this will lead to a softlockup >> problem (like the below back trace). >>> So, we should get a blocking lock to let the dlm lock to this node and also >> can avoid CPU loop, >> Can we use 'cond_resched()' to allow the thread to release the CPU >> temperately for solving this softlockup? > Yes, we can use cond_resched() function to avoid this softlockup. > In fact, if the kernel is configured with CONFIG_PREEMPT=y, this softlockup > does not happen since the kernel can help. > But, this way still leads to CPU resource waste, CPU usage can reach about > 80% - 100% when > multiple nodes read/write/mmap-access the same file concurrently, and more, > the read/write/mmap-access > speed is more lower (50% decrease). > Why? > Because we need to get DLM lock for each node, before one node gets DLM lock, > another node has > to down-convert this DLM lock, that means flushing the memory data to the > disk before DLM lock down-conversion. > this disk IO operation is very slow compared with CPU cycle, that means the > node which want to get DLM lock, > will do lots of reties before another node complete down-converting this DLM > lock, actual, these retries do not make > sense, just waste CPU cycle. > So, if we add a blocking lock/unlock here, we will avoid these unnecessary > reties, especially in case slow-speed disk and more ocfs2 nodes(>=3). > I did the ocfs2 test case (multi_mmap in multiple_run.sh), after applied this > patch, the CPU rate on each node was about 40%-50%, and the test case > execution time reduced by half. > the full command is as below, > multiple_run.sh -i eth0 -k ~/linux-4.4.21-69.tar.gz -o ~/ocfs2mullog -C > hacluster -s pcmk -n nd1,nd2,nd3 -d /dev/sda1 -b 4096 -c 32768 -t multi_mmap > /mnt/shared > the shared storage is a iscsi disk. > OK, I think it is more better if you can add you test method and result in change log. Thanks, Alex > Thanks > Gang > >> >>> second, base on my testing, the patch also can improve the efficiency in >> case modifying the same >>> file frequently from multiple nodes, since the lock acquisition chance is >> more fair. >>> In fact, the code was modified by a patch 1cce4df04f37 ("ocfs2: do not >> lock/unlock() inode DLM lock"), >>> before that patch, the code is the same, this patch can be considered to >> revert that patch, except adding more >>> clear comments. >> In patch 1cce4df04f37 ("ocfs2: do not lock/unlock() inode DLM lock"), >> Goldwyn says blocking lock and unlock will only make >> the performance worse where contention over the locks is high, which is the >> opposite of your described above. >> IMO, blocking lock and unlock here is indeed unnecessary. >> >> Thanks, >> Alex >>> >>> Thanks >>> Gang >>> >>> thanks, Jun On 2017/12/27 17:29, Gang He wrote: > If we can't get inode lock immediately in the function > ocfs2_inode_lock_with_page() when reading a page, we should not > return directly here, since this will lead to a softlockup problem. > The method is to get a blocking lock and immediately unlock before > returning, this can avoid CPU resource waste due to lots of retries, > and benefits fairness in getting lock among multiple nodes, increase > efficiency in case modifying the same file frequently from multiple > nodes. > The softlockup problem looks like, > Kernel panic - not syncing: softlockup: hung tasks > CPU: 0 PID: 885 Comm: multi_mmap Tainted: G L 4.12.14-6.1-default #1 > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > Call Trace: > > dump_stack+0x5c/0x82 > panic+0xd5/0x21e > watchdog_timer_fn+0x208/0x210 > ? watchdog_park_threads+0x70/0x70 > __hrtimer_run_queues+0xcc/0x200 > hrtimer_interrupt+0xa6/0x1f0 > smp_apic_timer_interrupt+0x34/0x50 > apic_timer_interrupt+0x96/0xa0 > > RIP: 0010:unlock_page+0x17/0x30 > RSP: :af154080bc88 EFLAGS: 0246 ORIG_RAX: ff10 > RAX: dead0100 RBX: f21e009f5300 RCX: 0004 > RDX: dead00ff RSI: 0202 RDI: f21e009f5300 > RBP: R08: R09: af154080bb00 > R10: af154080bc30 R11: 0040 R12: 993749a39518 > R13: R14: f21e009f5300 R15:
[PATCH net-next v6 6/6] net: dccp: Remove dccpprobe module
Remove DCCP probe module since jprobe has been deprecated. That function is now replaced by dccp/dccp_probe trace-event. You can use it via ftrace or perftools. Signed-off-by: Masami Hiramatsu--- Changes in v5: - Fix a conflict with previous change in Makefile. --- net/dccp/Kconfig | 17 net/dccp/Makefile |2 - net/dccp/probe.c | 203 - 3 files changed, 222 deletions(-) delete mode 100644 net/dccp/probe.c diff --git a/net/dccp/Kconfig b/net/dccp/Kconfig index 8c0ef71bed2f..b270e84d9c13 100644 --- a/net/dccp/Kconfig +++ b/net/dccp/Kconfig @@ -39,23 +39,6 @@ config IP_DCCP_DEBUG Just say N. -config NET_DCCPPROBE - tristate "DCCP connection probing" - depends on PROC_FS && KPROBES - ---help--- - This module allows for capturing the changes to DCCP connection - state in response to incoming packets. It is used for debugging - DCCP congestion avoidance modules. If you don't understand - what was just said, you don't need it: say N. - - Documentation on how to use DCCP connection probing can be found - at: - - http://www.linuxfoundation.org/collaborate/workgroups/networking/dccpprobe - - To compile this code as a module, choose M here: the - module will be called dccp_probe. - endmenu diff --git a/net/dccp/Makefile b/net/dccp/Makefile index 4215f13a63af..5b4ff37bc806 100644 --- a/net/dccp/Makefile +++ b/net/dccp/Makefile @@ -21,12 +21,10 @@ obj-$(subst y,$(CONFIG_IP_DCCP),$(CONFIG_IPV6)) += dccp_ipv6.o dccp_ipv6-y := ipv6.o obj-$(CONFIG_INET_DCCP_DIAG) += dccp_diag.o -obj-$(CONFIG_NET_DCCPPROBE) += dccp_probe.o dccp-$(CONFIG_SYSCTL) += sysctl.o dccp_diag-y := diag.o -dccp_probe-y := probe.o # build with local directory for trace.h CFLAGS_proto.o := -I$(src) diff --git a/net/dccp/probe.c b/net/dccp/probe.c deleted file mode 100644 index 3d3fda05b32d.. --- a/net/dccp/probe.c +++ /dev/null @@ -1,203 +0,0 @@ -/* - * dccp_probe - Observe the DCCP flow with kprobes. - * - * The idea for this came from Werner Almesberger's umlsim - * Copyright (C) 2004, Stephen Hemminger - * - * Modified for DCCP from Stephen Hemminger's code - * Copyright (C) 2006, Ian McDonald - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include "dccp.h" -#include "ccid.h" -#include "ccids/ccid3.h" - -static int port; - -static int bufsize = 64 * 1024; - -static const char procname[] = "dccpprobe"; - -static struct { - struct kfifo fifo; - spinlock_tlock; - wait_queue_head_t wait; - struct timespec64 tstart; -} dccpw; - -static void printl(const char *fmt, ...) -{ - va_list args; - int len; - struct timespec64 now; - char tbuf[256]; - - va_start(args, fmt); - getnstimeofday64(); - - now = timespec64_sub(now, dccpw.tstart); - - len = sprintf(tbuf, "%lu.%06lu ", - (unsigned long) now.tv_sec, - (unsigned long) now.tv_nsec / NSEC_PER_USEC); - len += vscnprintf(tbuf+len, sizeof(tbuf)-len, fmt, args); - va_end(args); - - kfifo_in_locked(, tbuf, len, ); - wake_up(); -} - -static int jdccp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) -{ - const struct inet_sock *inet = inet_sk(sk); - struct ccid3_hc_tx_sock *hc = NULL; - - if (ccid_get_current_tx_ccid(dccp_sk(sk)) == DCCPC_CCID3) - hc = ccid3_hc_tx_sk(sk); - - if (port == 0 || ntohs(inet->inet_dport) == port || - ntohs(inet->inet_sport) == port) { - if (hc) - printl("%pI4:%u %pI4:%u %d %d %d %d %u %llu %llu %d\n", - >inet_saddr, ntohs(inet->inet_sport), - >inet_daddr, ntohs(inet->inet_dport), size, - hc->tx_s, hc->tx_rtt, hc->tx_p, - hc->tx_x_calc, hc->tx_x_recv >> 6, - hc->tx_x >> 6, hc->tx_t_ipi); - else - printl("%pI4:%u %pI4:%u
[PATCH net-next v6 3/6] net: sctp: Add SCTP ACK tracking trace event
Add SCTP ACK tracking trace event to trace the changes of SCTP association state in response to incoming packets. It is used for debugging SCTP congestion control algorithms, and will replace sctp_probe module. Note that this event a bit tricky. Since this consists of 2 events (sctp_probe and sctp_probe_path) so you have to enable both events as below. # cd /sys/kernel/debug/tracing # echo 1 > events/sctp/sctp_probe/enable # echo 1 > events/sctp/sctp_probe_path/enable Or, you can enable all the events under sctp. # echo 1 > events/sctp/enable Since sctp_probe_path event is always invoked from sctp_probe event, you can not see any output if you only enable sctp_probe_path. Signed-off-by: Masami Hiramatsu--- Changes in v3: - Add checking whether sctp_probe_path event is enabled before iterating sctp paths to record. Thanks Steven. Changes in v4: - Move a temporal variable definition in the block. - Fix to cast pointer to unsigned long instead of __u64 for 32bit environment. --- include/trace/events/sctp.h | 99 +++ net/sctp/sm_statefuns.c |5 ++ 2 files changed, 104 insertions(+) create mode 100644 include/trace/events/sctp.h diff --git a/include/trace/events/sctp.h b/include/trace/events/sctp.h new file mode 100644 index ..7475c7be165a --- /dev/null +++ b/include/trace/events/sctp.h @@ -0,0 +1,99 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM sctp + +#if !defined(_TRACE_SCTP_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_SCTP_H + +#include +#include + +TRACE_EVENT(sctp_probe_path, + + TP_PROTO(struct sctp_transport *sp, +const struct sctp_association *asoc), + + TP_ARGS(sp, asoc), + + TP_STRUCT__entry( + __field(__u64, asoc) + __field(__u32, primary) + __array(__u8, ipaddr, sizeof(union sctp_addr)) + __field(__u32, state) + __field(__u32, cwnd) + __field(__u32, ssthresh) + __field(__u32, flight_size) + __field(__u32, partial_bytes_acked) + __field(__u32, pathmtu) + ), + + TP_fast_assign( + __entry->asoc = (unsigned long)asoc; + __entry->primary = (sp == asoc->peer.primary_path); + memcpy(__entry->ipaddr, >ipaddr, sizeof(union sctp_addr)); + __entry->state = sp->state; + __entry->cwnd = sp->cwnd; + __entry->ssthresh = sp->ssthresh; + __entry->flight_size = sp->flight_size; + __entry->partial_bytes_acked = sp->partial_bytes_acked; + __entry->pathmtu = sp->pathmtu; + ), + + TP_printk("asoc=%#llx%s ipaddr=%pISpc state=%u cwnd=%u ssthresh=%u " + "flight_size=%u partial_bytes_acked=%u pathmtu=%u", + __entry->asoc, __entry->primary ? "(*)" : "", + __entry->ipaddr, __entry->state, __entry->cwnd, + __entry->ssthresh, __entry->flight_size, + __entry->partial_bytes_acked, __entry->pathmtu) +); + +TRACE_EVENT(sctp_probe, + + TP_PROTO(const struct sctp_endpoint *ep, +const struct sctp_association *asoc, +struct sctp_chunk *chunk), + + TP_ARGS(ep, asoc, chunk), + + TP_STRUCT__entry( + __field(__u64, asoc) + __field(__u32, mark) + __field(__u16, bind_port) + __field(__u16, peer_port) + __field(__u32, pathmtu) + __field(__u32, rwnd) + __field(__u16, unack_data) + ), + + TP_fast_assign( + struct sk_buff *skb = chunk->skb; + + __entry->asoc = (unsigned long)asoc; + __entry->mark = skb->mark; + __entry->bind_port = ep->base.bind_addr.port; + __entry->peer_port = asoc->peer.port; + __entry->pathmtu = asoc->pathmtu; + __entry->rwnd = asoc->peer.rwnd; + __entry->unack_data = asoc->unack_data; + + if (trace_sctp_probe_path_enabled()) { + struct sctp_transport *sp; + + list_for_each_entry(sp, >peer.transport_addr_list, + transports) { + trace_sctp_probe_path(sp, asoc); + } + } + ), + + TP_printk("asoc=%#llx mark=%#x bind_port=%d peer_port=%d pathmtu=%d " + "rwnd=%u unack_data=%d", + __entry->asoc, __entry->mark, __entry->bind_port, + __entry->peer_port, __entry->pathmtu, __entry->rwnd, + __entry->unack_data) +); + +#endif /* _TRACE_SCTP_H */ + +/* This part must be outside protection */ +#include diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c index
[PATCH net-next v6 6/6] net: dccp: Remove dccpprobe module
Remove DCCP probe module since jprobe has been deprecated. That function is now replaced by dccp/dccp_probe trace-event. You can use it via ftrace or perftools. Signed-off-by: Masami Hiramatsu --- Changes in v5: - Fix a conflict with previous change in Makefile. --- net/dccp/Kconfig | 17 net/dccp/Makefile |2 - net/dccp/probe.c | 203 - 3 files changed, 222 deletions(-) delete mode 100644 net/dccp/probe.c diff --git a/net/dccp/Kconfig b/net/dccp/Kconfig index 8c0ef71bed2f..b270e84d9c13 100644 --- a/net/dccp/Kconfig +++ b/net/dccp/Kconfig @@ -39,23 +39,6 @@ config IP_DCCP_DEBUG Just say N. -config NET_DCCPPROBE - tristate "DCCP connection probing" - depends on PROC_FS && KPROBES - ---help--- - This module allows for capturing the changes to DCCP connection - state in response to incoming packets. It is used for debugging - DCCP congestion avoidance modules. If you don't understand - what was just said, you don't need it: say N. - - Documentation on how to use DCCP connection probing can be found - at: - - http://www.linuxfoundation.org/collaborate/workgroups/networking/dccpprobe - - To compile this code as a module, choose M here: the - module will be called dccp_probe. - endmenu diff --git a/net/dccp/Makefile b/net/dccp/Makefile index 4215f13a63af..5b4ff37bc806 100644 --- a/net/dccp/Makefile +++ b/net/dccp/Makefile @@ -21,12 +21,10 @@ obj-$(subst y,$(CONFIG_IP_DCCP),$(CONFIG_IPV6)) += dccp_ipv6.o dccp_ipv6-y := ipv6.o obj-$(CONFIG_INET_DCCP_DIAG) += dccp_diag.o -obj-$(CONFIG_NET_DCCPPROBE) += dccp_probe.o dccp-$(CONFIG_SYSCTL) += sysctl.o dccp_diag-y := diag.o -dccp_probe-y := probe.o # build with local directory for trace.h CFLAGS_proto.o := -I$(src) diff --git a/net/dccp/probe.c b/net/dccp/probe.c deleted file mode 100644 index 3d3fda05b32d.. --- a/net/dccp/probe.c +++ /dev/null @@ -1,203 +0,0 @@ -/* - * dccp_probe - Observe the DCCP flow with kprobes. - * - * The idea for this came from Werner Almesberger's umlsim - * Copyright (C) 2004, Stephen Hemminger - * - * Modified for DCCP from Stephen Hemminger's code - * Copyright (C) 2006, Ian McDonald - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include "dccp.h" -#include "ccid.h" -#include "ccids/ccid3.h" - -static int port; - -static int bufsize = 64 * 1024; - -static const char procname[] = "dccpprobe"; - -static struct { - struct kfifo fifo; - spinlock_tlock; - wait_queue_head_t wait; - struct timespec64 tstart; -} dccpw; - -static void printl(const char *fmt, ...) -{ - va_list args; - int len; - struct timespec64 now; - char tbuf[256]; - - va_start(args, fmt); - getnstimeofday64(); - - now = timespec64_sub(now, dccpw.tstart); - - len = sprintf(tbuf, "%lu.%06lu ", - (unsigned long) now.tv_sec, - (unsigned long) now.tv_nsec / NSEC_PER_USEC); - len += vscnprintf(tbuf+len, sizeof(tbuf)-len, fmt, args); - va_end(args); - - kfifo_in_locked(, tbuf, len, ); - wake_up(); -} - -static int jdccp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) -{ - const struct inet_sock *inet = inet_sk(sk); - struct ccid3_hc_tx_sock *hc = NULL; - - if (ccid_get_current_tx_ccid(dccp_sk(sk)) == DCCPC_CCID3) - hc = ccid3_hc_tx_sk(sk); - - if (port == 0 || ntohs(inet->inet_dport) == port || - ntohs(inet->inet_sport) == port) { - if (hc) - printl("%pI4:%u %pI4:%u %d %d %d %d %u %llu %llu %d\n", - >inet_saddr, ntohs(inet->inet_sport), - >inet_daddr, ntohs(inet->inet_dport), size, - hc->tx_s, hc->tx_rtt, hc->tx_p, - hc->tx_x_calc, hc->tx_x_recv >> 6, - hc->tx_x >> 6, hc->tx_t_ipi); - else - printl("%pI4:%u %pI4:%u %d\n", - >inet_saddr,
[PATCH net-next v6 3/6] net: sctp: Add SCTP ACK tracking trace event
Add SCTP ACK tracking trace event to trace the changes of SCTP association state in response to incoming packets. It is used for debugging SCTP congestion control algorithms, and will replace sctp_probe module. Note that this event a bit tricky. Since this consists of 2 events (sctp_probe and sctp_probe_path) so you have to enable both events as below. # cd /sys/kernel/debug/tracing # echo 1 > events/sctp/sctp_probe/enable # echo 1 > events/sctp/sctp_probe_path/enable Or, you can enable all the events under sctp. # echo 1 > events/sctp/enable Since sctp_probe_path event is always invoked from sctp_probe event, you can not see any output if you only enable sctp_probe_path. Signed-off-by: Masami Hiramatsu --- Changes in v3: - Add checking whether sctp_probe_path event is enabled before iterating sctp paths to record. Thanks Steven. Changes in v4: - Move a temporal variable definition in the block. - Fix to cast pointer to unsigned long instead of __u64 for 32bit environment. --- include/trace/events/sctp.h | 99 +++ net/sctp/sm_statefuns.c |5 ++ 2 files changed, 104 insertions(+) create mode 100644 include/trace/events/sctp.h diff --git a/include/trace/events/sctp.h b/include/trace/events/sctp.h new file mode 100644 index ..7475c7be165a --- /dev/null +++ b/include/trace/events/sctp.h @@ -0,0 +1,99 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM sctp + +#if !defined(_TRACE_SCTP_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_SCTP_H + +#include +#include + +TRACE_EVENT(sctp_probe_path, + + TP_PROTO(struct sctp_transport *sp, +const struct sctp_association *asoc), + + TP_ARGS(sp, asoc), + + TP_STRUCT__entry( + __field(__u64, asoc) + __field(__u32, primary) + __array(__u8, ipaddr, sizeof(union sctp_addr)) + __field(__u32, state) + __field(__u32, cwnd) + __field(__u32, ssthresh) + __field(__u32, flight_size) + __field(__u32, partial_bytes_acked) + __field(__u32, pathmtu) + ), + + TP_fast_assign( + __entry->asoc = (unsigned long)asoc; + __entry->primary = (sp == asoc->peer.primary_path); + memcpy(__entry->ipaddr, >ipaddr, sizeof(union sctp_addr)); + __entry->state = sp->state; + __entry->cwnd = sp->cwnd; + __entry->ssthresh = sp->ssthresh; + __entry->flight_size = sp->flight_size; + __entry->partial_bytes_acked = sp->partial_bytes_acked; + __entry->pathmtu = sp->pathmtu; + ), + + TP_printk("asoc=%#llx%s ipaddr=%pISpc state=%u cwnd=%u ssthresh=%u " + "flight_size=%u partial_bytes_acked=%u pathmtu=%u", + __entry->asoc, __entry->primary ? "(*)" : "", + __entry->ipaddr, __entry->state, __entry->cwnd, + __entry->ssthresh, __entry->flight_size, + __entry->partial_bytes_acked, __entry->pathmtu) +); + +TRACE_EVENT(sctp_probe, + + TP_PROTO(const struct sctp_endpoint *ep, +const struct sctp_association *asoc, +struct sctp_chunk *chunk), + + TP_ARGS(ep, asoc, chunk), + + TP_STRUCT__entry( + __field(__u64, asoc) + __field(__u32, mark) + __field(__u16, bind_port) + __field(__u16, peer_port) + __field(__u32, pathmtu) + __field(__u32, rwnd) + __field(__u16, unack_data) + ), + + TP_fast_assign( + struct sk_buff *skb = chunk->skb; + + __entry->asoc = (unsigned long)asoc; + __entry->mark = skb->mark; + __entry->bind_port = ep->base.bind_addr.port; + __entry->peer_port = asoc->peer.port; + __entry->pathmtu = asoc->pathmtu; + __entry->rwnd = asoc->peer.rwnd; + __entry->unack_data = asoc->unack_data; + + if (trace_sctp_probe_path_enabled()) { + struct sctp_transport *sp; + + list_for_each_entry(sp, >peer.transport_addr_list, + transports) { + trace_sctp_probe_path(sp, asoc); + } + } + ), + + TP_printk("asoc=%#llx mark=%#x bind_port=%d peer_port=%d pathmtu=%d " + "rwnd=%u unack_data=%d", + __entry->asoc, __entry->mark, __entry->bind_port, + __entry->peer_port, __entry->pathmtu, __entry->rwnd, + __entry->unack_data) +); + +#endif /* _TRACE_SCTP_H */ + +/* This part must be outside protection */ +#include diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c index 541f34735346..eb7905ffe5f2
[PATCH net-next v6 0/6] net: tcp: sctp: dccp: Replace jprobe usage with trace events
Hi, This series is v6 of the replacement of jprobe usage with trace events. This version fixes trace/events/tcp.h to avoid sparse warning, as same as Mat Martineau's patch. Previous version is here; https://www.spinics.net/lists/netdev/msg474057.html Changes from v5: [1/6]: Avoid preprocessor directives in tracepoint macro args Thank you, --- Masami Hiramatsu (6): net: tcp: Add trace events for TCP congestion window tracing net: tcp: Remove TCP probe module net: sctp: Add SCTP ACK tracking trace event net: sctp: Remove debug SCTP probe module net: dccp: Add DCCP sendmsg trace event net: dccp: Remove dccpprobe module include/trace/events/sctp.h | 99 ++ include/trace/events/tcp.h | 97 ++ net/Kconfig | 17 -- net/dccp/Kconfig| 17 -- net/dccp/Makefile |5 - net/dccp/probe.c| 203 - net/dccp/proto.c|5 + net/dccp/trace.h| 105 +++ net/ipv4/Makefile |1 net/ipv4/tcp_input.c|3 net/ipv4/tcp_probe.c| 301 --- net/sctp/Kconfig| 12 -- net/sctp/Makefile |3 net/sctp/probe.c| 244 --- net/sctp/sm_statefuns.c |5 + 15 files changed, 317 insertions(+), 800 deletions(-) create mode 100644 include/trace/events/sctp.h delete mode 100644 net/dccp/probe.c create mode 100644 net/dccp/trace.h delete mode 100644 net/ipv4/tcp_probe.c delete mode 100644 net/sctp/probe.c -- Masami Hiramatsu (Linaro)
[PATCH net-next v6 0/6] net: tcp: sctp: dccp: Replace jprobe usage with trace events
Hi, This series is v6 of the replacement of jprobe usage with trace events. This version fixes trace/events/tcp.h to avoid sparse warning, as same as Mat Martineau's patch. Previous version is here; https://www.spinics.net/lists/netdev/msg474057.html Changes from v5: [1/6]: Avoid preprocessor directives in tracepoint macro args Thank you, --- Masami Hiramatsu (6): net: tcp: Add trace events for TCP congestion window tracing net: tcp: Remove TCP probe module net: sctp: Add SCTP ACK tracking trace event net: sctp: Remove debug SCTP probe module net: dccp: Add DCCP sendmsg trace event net: dccp: Remove dccpprobe module include/trace/events/sctp.h | 99 ++ include/trace/events/tcp.h | 97 ++ net/Kconfig | 17 -- net/dccp/Kconfig| 17 -- net/dccp/Makefile |5 - net/dccp/probe.c| 203 - net/dccp/proto.c|5 + net/dccp/trace.h| 105 +++ net/ipv4/Makefile |1 net/ipv4/tcp_input.c|3 net/ipv4/tcp_probe.c| 301 --- net/sctp/Kconfig| 12 -- net/sctp/Makefile |3 net/sctp/probe.c| 244 --- net/sctp/sm_statefuns.c |5 + 15 files changed, 317 insertions(+), 800 deletions(-) create mode 100644 include/trace/events/sctp.h delete mode 100644 net/dccp/probe.c create mode 100644 net/dccp/trace.h delete mode 100644 net/ipv4/tcp_probe.c delete mode 100644 net/sctp/probe.c -- Masami Hiramatsu (Linaro)
Re: [PATCH 2/2] staging: iio: add spaces around '-' operator
On Wed, 27 Dec 2017, Ji-Hun Kim wrote: > Clean up checkpatch warning: > CHECK: spaces preferred around that '-' (ctx:VxV) > > Signed-off-by: Ji-Hun Kim> --- > drivers/staging/iio/adc/ad7192.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/staging/iio/adc/ad7192.c > b/drivers/staging/iio/adc/ad7192.c > index f015955..c4eff71 100644 > --- a/drivers/staging/iio/adc/ad7192.c > +++ b/drivers/staging/iio/adc/ad7192.c > @@ -340,7 +340,7 @@ ad7192_show_scale_available(struct device *dev, > } > > static IIO_DEVICE_ATTR_NAMED(in_v_m_v_scale_available, > - in_voltage-voltage_scale_available, > + in_voltage - voltage_scale_available, I think this has been discussed at length before, and it is a hyphen not a subtraction. IIO_DEVICE_ATTR_NAMED is a macro, as indicated by the capital letters, and you have to look and see what the code expands into. julia >0444, ad7192_show_scale_available, NULL, 0); > > static IIO_DEVICE_ATTR(in_voltage_scale_available, 0444, > -- > 2.10.1 (Apple Git-78) > > -- > To unsubscribe from this list: send the line "unsubscribe kernel-janitors" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
Re: [PATCH 2/2] staging: iio: add spaces around '-' operator
On Wed, 27 Dec 2017, Ji-Hun Kim wrote: > Clean up checkpatch warning: > CHECK: spaces preferred around that '-' (ctx:VxV) > > Signed-off-by: Ji-Hun Kim > --- > drivers/staging/iio/adc/ad7192.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/staging/iio/adc/ad7192.c > b/drivers/staging/iio/adc/ad7192.c > index f015955..c4eff71 100644 > --- a/drivers/staging/iio/adc/ad7192.c > +++ b/drivers/staging/iio/adc/ad7192.c > @@ -340,7 +340,7 @@ ad7192_show_scale_available(struct device *dev, > } > > static IIO_DEVICE_ATTR_NAMED(in_v_m_v_scale_available, > - in_voltage-voltage_scale_available, > + in_voltage - voltage_scale_available, I think this has been discussed at length before, and it is a hyphen not a subtraction. IIO_DEVICE_ATTR_NAMED is a macro, as indicated by the capital letters, and you have to look and see what the code expands into. julia >0444, ad7192_show_scale_available, NULL, 0); > > static IIO_DEVICE_ATTR(in_voltage_scale_available, 0444, > -- > 2.10.1 (Apple Git-78) > > -- > To unsubscribe from this list: send the line "unsubscribe kernel-janitors" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
Re: [PATCH 4.14 00/74] 4.14.10-stable review
On 27 December 2017 at 22:15, Greg Kroah-Hartmanwrote: > This is the start of the stable review cycle for the 4.14.10 release. > There are 74 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Fri Dec 29 16:45:52 UTC 2017. > Anything received after that time might be too late. > > The whole patch series can be found in one patch at: > kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.10-rc1.gz > or in the git tree and branch at: > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > linux-4.14.y > and the diffstat can be found below. > > thanks, > > greg k-h Results from Linaro’s test farm. No regressions on arm64 and arm. x86_64 build results will be shared soon in this email thread. Summary kernel: 4.14.10-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.14.y git commit: 50f9cbb27797c6cf4d22ecb1eef0043a10a120cd git describe: v4.14.9-75-g50f9cbb27797 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.14-oe/build/v4.14.9-75-g50f9cbb27797 No regressions (compared to build v4.14.9) Boards, architectures and test suites: - hi6220-hikey - arm64 * boot - pass: 20, * kselftest - pass: 46, skip: 16 * libhugetlbfs - pass: 90, skip: 1 * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 60, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 21, skip: 1 * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 14, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 983, skip: 121 * ltp-timers-tests - pass: 12, juno-r2 - arm64 * boot - pass: 20, * kselftest - pass: 45, skip: 17 * libhugetlbfs - pass: 90, skip: 1 * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 60, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 22, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 14, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 987, skip: 121 * ltp-timers-tests - pass: 12, x15 - arm * boot - pass: 20, * kselftest - pass: 41, skip: 20 * libhugetlbfs - pass: 87, skip: 1 * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 60, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 20, skip: 2 * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 13, skip: 1 * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 1037, skip: 66 * ltp-timers-tests - pass: 12, Documentation - https://collaborate.linaro.org/display/LKFT/Email+Reports Tested-by: Naresh Kamboju
Re: [PATCH 4.14 00/74] 4.14.10-stable review
On 27 December 2017 at 22:15, Greg Kroah-Hartman wrote: > This is the start of the stable review cycle for the 4.14.10 release. > There are 74 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Fri Dec 29 16:45:52 UTC 2017. > Anything received after that time might be too late. > > The whole patch series can be found in one patch at: > kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.10-rc1.gz > or in the git tree and branch at: > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > linux-4.14.y > and the diffstat can be found below. > > thanks, > > greg k-h Results from Linaro’s test farm. No regressions on arm64 and arm. x86_64 build results will be shared soon in this email thread. Summary kernel: 4.14.10-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.14.y git commit: 50f9cbb27797c6cf4d22ecb1eef0043a10a120cd git describe: v4.14.9-75-g50f9cbb27797 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.14-oe/build/v4.14.9-75-g50f9cbb27797 No regressions (compared to build v4.14.9) Boards, architectures and test suites: - hi6220-hikey - arm64 * boot - pass: 20, * kselftest - pass: 46, skip: 16 * libhugetlbfs - pass: 90, skip: 1 * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 60, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 21, skip: 1 * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 14, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 983, skip: 121 * ltp-timers-tests - pass: 12, juno-r2 - arm64 * boot - pass: 20, * kselftest - pass: 45, skip: 17 * libhugetlbfs - pass: 90, skip: 1 * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 60, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 22, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 14, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 987, skip: 121 * ltp-timers-tests - pass: 12, x15 - arm * boot - pass: 20, * kselftest - pass: 41, skip: 20 * libhugetlbfs - pass: 87, skip: 1 * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 60, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 20, skip: 2 * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 13, skip: 1 * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 1037, skip: 66 * ltp-timers-tests - pass: 12, Documentation - https://collaborate.linaro.org/display/LKFT/Email+Reports Tested-by: Naresh Kamboju
Re: [PATCH] pinctrl: spear: Delete an error message for a failed memory allocation in spear_pinctrl_probe()
On 27-12-17, 22:48, SF Markus Elfring wrote: > From: Markus Elfring> Date: Wed, 27 Dec 2017 22:44:04 +0100 > > Omit an extra message for a memory allocation failure in this function. > > This issue was detected by using the Coccinelle software. > > Signed-off-by: Markus Elfring > --- > drivers/pinctrl/spear/pinctrl-spear.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/drivers/pinctrl/spear/pinctrl-spear.c > b/drivers/pinctrl/spear/pinctrl-spear.c > index 4db52ba38d8d..efe79d3f7659 100644 > --- a/drivers/pinctrl/spear/pinctrl-spear.c > +++ b/drivers/pinctrl/spear/pinctrl-spear.c > @@ -361,10 +361,8 @@ int spear_pinctrl_probe(struct platform_device *pdev, > return -ENODEV; > > pmx = devm_kzalloc(>dev, sizeof(*pmx), GFP_KERNEL); > - if (!pmx) { > - dev_err(>dev, "Can't alloc spear_pmx\n"); > + if (!pmx) > return -ENOMEM; > - } > > res = platform_get_resource(pdev, IORESOURCE_MEM, 0); > pmx->vbase = devm_ioremap_resource(>dev, res); Acked-by: Viresh Kumar -- viresh
Re: [PATCH] pinctrl: spear: Delete an error message for a failed memory allocation in spear_pinctrl_probe()
On 27-12-17, 22:48, SF Markus Elfring wrote: > From: Markus Elfring > Date: Wed, 27 Dec 2017 22:44:04 +0100 > > Omit an extra message for a memory allocation failure in this function. > > This issue was detected by using the Coccinelle software. > > Signed-off-by: Markus Elfring > --- > drivers/pinctrl/spear/pinctrl-spear.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/drivers/pinctrl/spear/pinctrl-spear.c > b/drivers/pinctrl/spear/pinctrl-spear.c > index 4db52ba38d8d..efe79d3f7659 100644 > --- a/drivers/pinctrl/spear/pinctrl-spear.c > +++ b/drivers/pinctrl/spear/pinctrl-spear.c > @@ -361,10 +361,8 @@ int spear_pinctrl_probe(struct platform_device *pdev, > return -ENODEV; > > pmx = devm_kzalloc(>dev, sizeof(*pmx), GFP_KERNEL); > - if (!pmx) { > - dev_err(>dev, "Can't alloc spear_pmx\n"); > + if (!pmx) > return -ENOMEM; > - } > > res = platform_get_resource(pdev, IORESOURCE_MEM, 0); > pmx->vbase = devm_ioremap_resource(>dev, res); Acked-by: Viresh Kumar -- viresh
Re: [PATCH] pinctrl/spear/plgpio: Delete two error messages for a failed memory allocation in plgpio_probe()
On 27-12-17, 22:39, SF Markus Elfring wrote: > From: Markus Elfring> Date: Wed, 27 Dec 2017 22:34:28 +0100 > > Omit extra messages for a memory allocation failure in this function. > > This issue was detected by using the Coccinelle software. > > Signed-off-by: Markus Elfring > --- > drivers/pinctrl/spear/pinctrl-plgpio.c | 8 ++-- > 1 file changed, 2 insertions(+), 6 deletions(-) > > diff --git a/drivers/pinctrl/spear/pinctrl-plgpio.c > b/drivers/pinctrl/spear/pinctrl-plgpio.c > index 6a0ed8ab33b9..d2123e396b29 100644 > --- a/drivers/pinctrl/spear/pinctrl-plgpio.c > +++ b/drivers/pinctrl/spear/pinctrl-plgpio.c > @@ -519,10 +519,8 @@ static int plgpio_probe(struct platform_device *pdev) > int ret, irq; > > plgpio = devm_kzalloc(>dev, sizeof(*plgpio), GFP_KERNEL); > - if (!plgpio) { > - dev_err(>dev, "memory allocation fail\n"); > + if (!plgpio) > return -ENOMEM; > - } > > res = platform_get_resource(pdev, IORESOURCE_MEM, 0); > plgpio->base = devm_ioremap_resource(>dev, res); > @@ -544,10 +542,8 @@ static int plgpio_probe(struct platform_device *pdev) > sizeof(*plgpio->csave_regs) * > DIV_ROUND_UP(plgpio->chip.ngpio, MAX_GPIO_PER_REG), > GFP_KERNEL); > - if (!plgpio->csave_regs) { > - dev_err(>dev, "csave registers memory allocation fail\n"); > + if (!plgpio->csave_regs) > return -ENOMEM; > - } > #endif > > platform_set_drvdata(pdev, plgpio); Acked-by: Viresh Kumar -- viresh
Re: [PATCH] pinctrl/spear/plgpio: Delete two error messages for a failed memory allocation in plgpio_probe()
On 27-12-17, 22:39, SF Markus Elfring wrote: > From: Markus Elfring > Date: Wed, 27 Dec 2017 22:34:28 +0100 > > Omit extra messages for a memory allocation failure in this function. > > This issue was detected by using the Coccinelle software. > > Signed-off-by: Markus Elfring > --- > drivers/pinctrl/spear/pinctrl-plgpio.c | 8 ++-- > 1 file changed, 2 insertions(+), 6 deletions(-) > > diff --git a/drivers/pinctrl/spear/pinctrl-plgpio.c > b/drivers/pinctrl/spear/pinctrl-plgpio.c > index 6a0ed8ab33b9..d2123e396b29 100644 > --- a/drivers/pinctrl/spear/pinctrl-plgpio.c > +++ b/drivers/pinctrl/spear/pinctrl-plgpio.c > @@ -519,10 +519,8 @@ static int plgpio_probe(struct platform_device *pdev) > int ret, irq; > > plgpio = devm_kzalloc(>dev, sizeof(*plgpio), GFP_KERNEL); > - if (!plgpio) { > - dev_err(>dev, "memory allocation fail\n"); > + if (!plgpio) > return -ENOMEM; > - } > > res = platform_get_resource(pdev, IORESOURCE_MEM, 0); > plgpio->base = devm_ioremap_resource(>dev, res); > @@ -544,10 +542,8 @@ static int plgpio_probe(struct platform_device *pdev) > sizeof(*plgpio->csave_regs) * > DIV_ROUND_UP(plgpio->chip.ngpio, MAX_GPIO_PER_REG), > GFP_KERNEL); > - if (!plgpio->csave_regs) { > - dev_err(>dev, "csave registers memory allocation fail\n"); > + if (!plgpio->csave_regs) > return -ENOMEM; > - } > #endif > > platform_set_drvdata(pdev, plgpio); Acked-by: Viresh Kumar -- viresh
DIRECTOR IN CHARGE: DR.PATRICE TEME
UN Visitor Centre Department of Public Information United Nations Headquarters Room DHL-1B-154 New York, NY 10017 E-mail:un...@teewars.org United Nations Compensation Unit, In Affiliation with World Bank Our Ref: UN/WBO/042UK/2017. Congratulations Beneficiary, How are you today? Hope all is well with you and family? You may not understand why this mail came to you. We have been having a meeting for the past 7 months which just ended few days ago with the secretary to the UNITED NATIONS. This email is to all the people that have been scammed in any part of the world, the UNITED NATIONS in Affiliation with WORLD BANK have agreed to compensate them with the sum of USD US$980,000.00 Dollars. This includes every foreign contractors that may have not received their contract sum, and people that have had an unfinished transaction or international businesses that failed due to Government problems etc. We found your name in the list of those who are to benefit from these compensation exercise and that is why we are contacting you, this have been agreed upon and have been signed. You are advised to contact Dr.PATRICE TEME of our paying center in Africa, as he is our representative in Nigeria, contact him immediately for your Cheque/ International Bank Draft of US$980,000.00 Dollars. This fund is in form of a Bank Draft for security purpose ok? So he will send it to you and you can clear it in any bank of your choice. Therefore, you should send him your full Name and telephone number your correct mailing address where you want him to send the Draft to you. Contact Dr.PATRICE TEME of MAGNUM PLC PAYMENT CENTER with your payment code:ST/DPI/829 immediately for your Cheque at the given address below: DIRECTOR IN CHARGE: DR.PATRICE TEME E-MAIL:customerservice-magnumb...@smallinvestors.net TELEPHONE:+ 234-817-008-4240 FAX: +234-817-008-4240 I apologize on behalf of my organization for any delay you might have encountered in receiving your fund in the past. Thanks and God bless you and your family. Hoping to hear from you as soon as you cash your Bank Draft. Making the world a better place. You are required to contact the above person and furnish him with the following of your information that will be required to avoid any mistakes:- 1. Your Full Name : 2. Your Home/Mobile Telephone No: 3. Your Home or Office Address : 4. Age/Occupation/Marital Status: 5. Scanned copy of your identification: Congratulations, and I look forward to hear from you as soon as you confirm your payment making the world a better place VPTQEFPMBZJVFXNXYIXJBBOLMIQOJNMTOKZJMR
DIRECTOR IN CHARGE: DR.PATRICE TEME
UN Visitor Centre Department of Public Information United Nations Headquarters Room DHL-1B-154 New York, NY 10017 E-mail:un...@teewars.org United Nations Compensation Unit, In Affiliation with World Bank Our Ref: UN/WBO/042UK/2017. Congratulations Beneficiary, How are you today? Hope all is well with you and family? You may not understand why this mail came to you. We have been having a meeting for the past 7 months which just ended few days ago with the secretary to the UNITED NATIONS. This email is to all the people that have been scammed in any part of the world, the UNITED NATIONS in Affiliation with WORLD BANK have agreed to compensate them with the sum of USD US$980,000.00 Dollars. This includes every foreign contractors that may have not received their contract sum, and people that have had an unfinished transaction or international businesses that failed due to Government problems etc. We found your name in the list of those who are to benefit from these compensation exercise and that is why we are contacting you, this have been agreed upon and have been signed. You are advised to contact Dr.PATRICE TEME of our paying center in Africa, as he is our representative in Nigeria, contact him immediately for your Cheque/ International Bank Draft of US$980,000.00 Dollars. This fund is in form of a Bank Draft for security purpose ok? So he will send it to you and you can clear it in any bank of your choice. Therefore, you should send him your full Name and telephone number your correct mailing address where you want him to send the Draft to you. Contact Dr.PATRICE TEME of MAGNUM PLC PAYMENT CENTER with your payment code:ST/DPI/829 immediately for your Cheque at the given address below: DIRECTOR IN CHARGE: DR.PATRICE TEME E-MAIL:customerservice-magnumb...@smallinvestors.net TELEPHONE:+ 234-817-008-4240 FAX: +234-817-008-4240 I apologize on behalf of my organization for any delay you might have encountered in receiving your fund in the past. Thanks and God bless you and your family. Hoping to hear from you as soon as you cash your Bank Draft. Making the world a better place. You are required to contact the above person and furnish him with the following of your information that will be required to avoid any mistakes:- 1. Your Full Name : 2. Your Home/Mobile Telephone No: 3. Your Home or Office Address : 4. Age/Occupation/Marital Status: 5. Scanned copy of your identification: Congratulations, and I look forward to hear from you as soon as you confirm your payment making the world a better place VPTQEFPMBZJVFXNXYIXJBBOLMIQOJNMTOKZJMR
Re: [PATCH] objtool: Fix clang enum conversion warning
On Wed, Dec 27, 2017 at 10:42:45PM -0500, Nick Desaulniers wrote: > On Wed, Dec 27, 2017 at 12:38 PM, Josh Poimboeufwrote: > > On Wed, Dec 27, 2017 at 01:34:34PM +0100, Lukas Bulwahn wrote: > >> Assuming that the authorship of this one-line change does not matter, as it > >> is largely suggested by the clang compiler anyway, and we want to move the > >> change forward, we should decide on which of three patches to move > >> forward. I can give my Reviewed-by and Tested-by to any of them. > > I suppose Ingo would take the first and accumulate Reviewed-By tags. > I don't particularly care about authorship (please just fix the bug). > > > The patch from Lukas was the first one I received, so that's the one I > > used. I rewrote the commit msg for clarity and added my SOB and sent it > > to Ingo for merging. > > I think you should have kept Nicholas Mc Guire's Reviewed by tag? > Maybe Ingo can re-add his and mine when merging? Yes, sorry, I missed that one. Ingo, can you please add the following? Reviewed-by: Nicholas Mc Guire Reviewed-by: Nick Desaulniers -- Josh
Re: [PATCH] objtool: Fix clang enum conversion warning
On Wed, Dec 27, 2017 at 10:42:45PM -0500, Nick Desaulniers wrote: > On Wed, Dec 27, 2017 at 12:38 PM, Josh Poimboeuf wrote: > > On Wed, Dec 27, 2017 at 01:34:34PM +0100, Lukas Bulwahn wrote: > >> Assuming that the authorship of this one-line change does not matter, as it > >> is largely suggested by the clang compiler anyway, and we want to move the > >> change forward, we should decide on which of three patches to move > >> forward. I can give my Reviewed-by and Tested-by to any of them. > > I suppose Ingo would take the first and accumulate Reviewed-By tags. > I don't particularly care about authorship (please just fix the bug). > > > The patch from Lukas was the first one I received, so that's the one I > > used. I rewrote the commit msg for clarity and added my SOB and sent it > > to Ingo for merging. > > I think you should have kept Nicholas Mc Guire's Reviewed by tag? > Maybe Ingo can re-add his and mine when merging? Yes, sorry, I missed that one. Ingo, can you please add the following? Reviewed-by: Nicholas Mc Guire Reviewed-by: Nick Desaulniers -- Josh
Re: [f2fs-dev] [PATCH 1/2] f2fs: pass down write hints to block layer for bufferd write
Hi Jaegeuk, On 12/28/2017 12:26 PM, Jaegeuk Kim wrote: > On 12/23, Chao Yu wrote: >> On 2017/12/15 10:06, Jaegeuk Kim wrote: >>> On 12/14, Hyunchul Lee wrote: Hi Jaegeuk, I need your comment about the fs_iohint mount option. a) w/o fs_iohint, propagate user hints to low layer. b) w/ fs_iohint, ignore user hints, and use hints which is generated with F2FS. Chao suggests this option. because user hints are more accurate than file system. This is resonable, But I have some concerns about this option. The first thing is that blocks of a segments have different hints. This could make GC less effective. The second is that the separation between LIFE_MEDIUM and LIFE_LONG is really needed. I think that difference between them is a little ambigous for users, and LIFE_SHORT and LIFE_EXTREME is converted to different hints by F2FS. >>> >>> I think what we really can do would assign many user hints to our 3 DATA >>> logs likewise rw_hint_to_seg_type(), since it's just hints for user data. >>> Then, we can decide how to keep that as much as possible, since we have >>> another filesystem metadata such as meta and nodes. In addition, I don't >>> think we have to keep the original user-hints which makes F2FS logs be >>> messed up. >>> >>> With that mind, I can think of the below cases. Especially, if user wants >>> to keep their io_hints, we'd better recommend to use direct_io w/o >>> fs_iohints. >> >> >> >>> In order to keep this policy, I think fs_iohints would be better to be a >>> feature set by mkfs.f2fs and detected by sysfs entries for users. >>> >>> 1) w/ fs_iohints >>> >>> UserF2FS Block >>> --- >>> Meta WRITE_LIFE_MEDIUM >>> HOT_NODE WRITE_LIFE_NOTSET >>> WARM_NODE -' >>> COLD_NODE WRITE_LIFE_NONE >>> ioctl(cold) COLD_DATA WRITE_LIFE_EXTREME >>> extention list -' -' >>> WRITE_LIFE_EXTREME -' -' >>> WRITE_LIFE_SHORTHOT_DATA WRITE_LIFE_SHORT >>> >>> -- buffered_io >>> WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_LONG >>> WRITE_LIFE_NONE -' -' >>> WRITE_LIFE_MEDIUM -' -' >>> WRITE_LIFE_LONG -' -' >>> >>> -- direct_io (Not recommendable) >>> WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET >>> WRITE_LIFE_NONE -' WRITE_LIFE_NONE >>> WRITE_LIFE_MEDIUM -' WRITE_LIFE_MEDIUM >>> WRITE_LIFE_LONG -' WRITE_LIFE_LONG >> >> Agreed with above IO hint mapping rule. >> >>> >>> 2) w/o fs_iohints >>> >>> UserF2FS Block >>> --- >>> Meta - >>> HOT_NODE - >>> WARM_NODE - >>> COLD_NODE - >>> ioctl(cold) COLD_DATA - >>> extention list -' - >>> >>> -- buffered_io >>> WRITE_LIFE_EXTREME COLD_DATA - >>> WRITE_LIFE_SHORTHOT_DATA - >>> WRITE_LIFE_NOT_SET WARM_DATA - >>> WRITE_LIFE_NONE -' - >>> WRITE_LIFE_MEDIUM -' - >>> WRITE_LIFE_LONG -' - >> >> Now we recommend direct_io if user wants to give IO hint for storage, I >> suspect >> that user would suffer performance regression issue w/o buffered IO. >> >> Another problem is that, now, in Android, it will be very hard to prompt >> application to migrate their IO pattern from buffered IO to direct IO, one >> possible way is distinguishing user data lifetime from FWK, e.g. set >> WRITE_LIFE_SHORT for cache file or tmp file, set WRITE_LIFE_EXTREME for >> media file. >> >> In order to support buffered_io, would it be better to change mapping as >> below? >> >> -- buffered_io >> WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME >> WRITE_LIFE_SHORTHOT_DATA WRITE_LIFE_SHORT >> WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET >> WRITE_LIFE_NONE -' -' >> WRITE_LIFE_MEDIUM -' -' >> WRITE_LIFE_LONG -' -' > > Agreed, and it makes more sense that we'd better keep the write hints on > userdata given by applications. > > BTW, since we couldn't get any performance numbers with these, how about > adding a mount option like "-o iohints=MODE" where MODE may be one of > "fs-based",
Re: [f2fs-dev] [PATCH 1/2] f2fs: pass down write hints to block layer for bufferd write
Hi Jaegeuk, On 12/28/2017 12:26 PM, Jaegeuk Kim wrote: > On 12/23, Chao Yu wrote: >> On 2017/12/15 10:06, Jaegeuk Kim wrote: >>> On 12/14, Hyunchul Lee wrote: Hi Jaegeuk, I need your comment about the fs_iohint mount option. a) w/o fs_iohint, propagate user hints to low layer. b) w/ fs_iohint, ignore user hints, and use hints which is generated with F2FS. Chao suggests this option. because user hints are more accurate than file system. This is resonable, But I have some concerns about this option. The first thing is that blocks of a segments have different hints. This could make GC less effective. The second is that the separation between LIFE_MEDIUM and LIFE_LONG is really needed. I think that difference between them is a little ambigous for users, and LIFE_SHORT and LIFE_EXTREME is converted to different hints by F2FS. >>> >>> I think what we really can do would assign many user hints to our 3 DATA >>> logs likewise rw_hint_to_seg_type(), since it's just hints for user data. >>> Then, we can decide how to keep that as much as possible, since we have >>> another filesystem metadata such as meta and nodes. In addition, I don't >>> think we have to keep the original user-hints which makes F2FS logs be >>> messed up. >>> >>> With that mind, I can think of the below cases. Especially, if user wants >>> to keep their io_hints, we'd better recommend to use direct_io w/o >>> fs_iohints. >> >> >> >>> In order to keep this policy, I think fs_iohints would be better to be a >>> feature set by mkfs.f2fs and detected by sysfs entries for users. >>> >>> 1) w/ fs_iohints >>> >>> UserF2FS Block >>> --- >>> Meta WRITE_LIFE_MEDIUM >>> HOT_NODE WRITE_LIFE_NOTSET >>> WARM_NODE -' >>> COLD_NODE WRITE_LIFE_NONE >>> ioctl(cold) COLD_DATA WRITE_LIFE_EXTREME >>> extention list -' -' >>> WRITE_LIFE_EXTREME -' -' >>> WRITE_LIFE_SHORTHOT_DATA WRITE_LIFE_SHORT >>> >>> -- buffered_io >>> WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_LONG >>> WRITE_LIFE_NONE -' -' >>> WRITE_LIFE_MEDIUM -' -' >>> WRITE_LIFE_LONG -' -' >>> >>> -- direct_io (Not recommendable) >>> WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET >>> WRITE_LIFE_NONE -' WRITE_LIFE_NONE >>> WRITE_LIFE_MEDIUM -' WRITE_LIFE_MEDIUM >>> WRITE_LIFE_LONG -' WRITE_LIFE_LONG >> >> Agreed with above IO hint mapping rule. >> >>> >>> 2) w/o fs_iohints >>> >>> UserF2FS Block >>> --- >>> Meta - >>> HOT_NODE - >>> WARM_NODE - >>> COLD_NODE - >>> ioctl(cold) COLD_DATA - >>> extention list -' - >>> >>> -- buffered_io >>> WRITE_LIFE_EXTREME COLD_DATA - >>> WRITE_LIFE_SHORTHOT_DATA - >>> WRITE_LIFE_NOT_SET WARM_DATA - >>> WRITE_LIFE_NONE -' - >>> WRITE_LIFE_MEDIUM -' - >>> WRITE_LIFE_LONG -' - >> >> Now we recommend direct_io if user wants to give IO hint for storage, I >> suspect >> that user would suffer performance regression issue w/o buffered IO. >> >> Another problem is that, now, in Android, it will be very hard to prompt >> application to migrate their IO pattern from buffered IO to direct IO, one >> possible way is distinguishing user data lifetime from FWK, e.g. set >> WRITE_LIFE_SHORT for cache file or tmp file, set WRITE_LIFE_EXTREME for >> media file. >> >> In order to support buffered_io, would it be better to change mapping as >> below? >> >> -- buffered_io >> WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME >> WRITE_LIFE_SHORTHOT_DATA WRITE_LIFE_SHORT >> WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET >> WRITE_LIFE_NONE -' -' >> WRITE_LIFE_MEDIUM -' -' >> WRITE_LIFE_LONG -' -' > > Agreed, and it makes more sense that we'd better keep the write hints on > userdata given by applications. > > BTW, since we couldn't get any performance numbers with these, how about > adding a mount option like "-o iohints=MODE" where MODE may be one of > "fs-based",
Re: [PATCH v2] gpio: winbond: add driver
On Wed, Dec 27, 2017 at 07:42:21PM +0100, Maciej S. Szmigiero wrote: >On 27.12.2017 01:24, William Breathitt Gray wrote: >> On Mon, Dec 25, 2017 at 03:48:16PM +0100, Maciej S. Szmigiero wrote: >(..) >>> All the existing ISA bus drivers seem to depend on CONFIG_ISA_BUS_API >>> instead of selecting it but IMHO this is wrong because: >>> 1) This Kconfig option doesn't really enable or disable any bus support >>> but building of a library of some common boilerplate code. >>> Libraries are normally selected by drivers needing them and only provided >>> as an user-selectable option if there is a possibility that a out-of-tree >>> module would need it, >>> >>> 2) On x86_64 this option (or rather, its parent option CONFIG_ISA_BUS) >>> cannot be enabled without CONFIG_EXPERT, >>> >>> 3) This device isn't really a ISA bus device any more than, for example, >>> a 8250 serial port or a PC-style parallel port and these don't need >>> that an user explicitly enables "ISA bus support" in his kernel >>> configuration. >> >> I can see what you mean about selecting ISA_BUS_API rather than having >> it as a dependency for drivers. Part of the reason I added the >> CONFIG_EXPERT dependency for CONFIG_ISA_BUS -- as well as having >> CONFIG_ISA_BUS_API be a dependency for the drivers themselves -- was to >> hide the ISA-style drivers which blindly poke at I/O port addresses, >> lest a niave user enable all available drivers and unintentionally brick >> their system when the drivers execute. >> >> I think there is still merit in masking dangerous drivers such as this, >> since the expected behavior nowadays is for the driver to probe for the >> device before poking at memory; since ISA-style communication lacks a >> standard method of detecting devices in hardware, these devices >> generally pose a danger when loaded by niave users. > >This driver accesses the same Super I/O chip as w83627ehf hwmon and >w83627hf_wdt watchdog drivers. >In addition to this, there are loads of other hwmon and watchdog drivers >for x86 Super I/Os in the tree, most of them using the same probing and >communication style. >There are even existing GPIO drivers for some Super I/Os like gpio-it87 >and gpio-f7188x. > >None of these drivers need CONFIG_EXPERT to be selected. > >Also, CONFIG_EXPERT is described as "Configure standard kernel features" >and that "[it] allows certain base kernel options and settings to be >disabled or tweaked" for "specialized environments". >Enabling this driver is not about changing "standard kernel feature" or >a "base kernel option [or] setting". I'm sorry, I didn't make it quite clear in my previous reply. I agree with you that CONFIG_EXPERT shouldn't be necessary for this driver -- in the end, a select ISA_BUS_API line should be all that's needed to have ISA bus driver support for your driver. My reference to the CONFIG_EXPERT option is for masking options related to other ISA-style buses not commonly found in desktop systems. Devices like the Super I/O chip wouldn't fall into this category since LPC is pretty common and the relevant I/O port addresses are usually well known. Ultimately, the Winbond GPIO driver should not need CONFIG_EXPERT to be enabled in order to select ISA_BUS_API. William Breathitt Gray >> >> William Breathitt Gray > >Maciej Szmigiero
Re: [PATCH v2] gpio: winbond: add driver
On Wed, Dec 27, 2017 at 07:42:21PM +0100, Maciej S. Szmigiero wrote: >On 27.12.2017 01:24, William Breathitt Gray wrote: >> On Mon, Dec 25, 2017 at 03:48:16PM +0100, Maciej S. Szmigiero wrote: >(..) >>> All the existing ISA bus drivers seem to depend on CONFIG_ISA_BUS_API >>> instead of selecting it but IMHO this is wrong because: >>> 1) This Kconfig option doesn't really enable or disable any bus support >>> but building of a library of some common boilerplate code. >>> Libraries are normally selected by drivers needing them and only provided >>> as an user-selectable option if there is a possibility that a out-of-tree >>> module would need it, >>> >>> 2) On x86_64 this option (or rather, its parent option CONFIG_ISA_BUS) >>> cannot be enabled without CONFIG_EXPERT, >>> >>> 3) This device isn't really a ISA bus device any more than, for example, >>> a 8250 serial port or a PC-style parallel port and these don't need >>> that an user explicitly enables "ISA bus support" in his kernel >>> configuration. >> >> I can see what you mean about selecting ISA_BUS_API rather than having >> it as a dependency for drivers. Part of the reason I added the >> CONFIG_EXPERT dependency for CONFIG_ISA_BUS -- as well as having >> CONFIG_ISA_BUS_API be a dependency for the drivers themselves -- was to >> hide the ISA-style drivers which blindly poke at I/O port addresses, >> lest a niave user enable all available drivers and unintentionally brick >> their system when the drivers execute. >> >> I think there is still merit in masking dangerous drivers such as this, >> since the expected behavior nowadays is for the driver to probe for the >> device before poking at memory; since ISA-style communication lacks a >> standard method of detecting devices in hardware, these devices >> generally pose a danger when loaded by niave users. > >This driver accesses the same Super I/O chip as w83627ehf hwmon and >w83627hf_wdt watchdog drivers. >In addition to this, there are loads of other hwmon and watchdog drivers >for x86 Super I/Os in the tree, most of them using the same probing and >communication style. >There are even existing GPIO drivers for some Super I/Os like gpio-it87 >and gpio-f7188x. > >None of these drivers need CONFIG_EXPERT to be selected. > >Also, CONFIG_EXPERT is described as "Configure standard kernel features" >and that "[it] allows certain base kernel options and settings to be >disabled or tweaked" for "specialized environments". >Enabling this driver is not about changing "standard kernel feature" or >a "base kernel option [or] setting". I'm sorry, I didn't make it quite clear in my previous reply. I agree with you that CONFIG_EXPERT shouldn't be necessary for this driver -- in the end, a select ISA_BUS_API line should be all that's needed to have ISA bus driver support for your driver. My reference to the CONFIG_EXPERT option is for masking options related to other ISA-style buses not commonly found in desktop systems. Devices like the Super I/O chip wouldn't fall into this category since LPC is pretty common and the relevant I/O port addresses are usually well known. Ultimately, the Winbond GPIO driver should not need CONFIG_EXPERT to be enabled in order to select ISA_BUS_API. William Breathitt Gray >> >> William Breathitt Gray > >Maciej Szmigiero
Re: [PATCH V8 3/3] OPP: Allow "opp-hz" and "opp-microvolt" to contain magic values
On 27-12-17, 15:54, Rob Herring wrote: > On Wed, Dec 27, 2017 at 2:56 AM, Viresh Kumarwrote: > > On 26-12-17, 14:29, Rob Herring wrote: > >> On Mon, Dec 18, 2017 at 03:51:30PM +0530, Viresh Kumar wrote: > > > >> > +On some platforms the exact frequency or voltage may be hidden from the > >> > OS by > >> > +the firmware and the "opp-hz" or the "opp-microvolt" properties may > >> > contain > >> > +magic values that represent the frequency or voltage in a firmware > >> > dependent > >> > +way, for example an index of an array in the firmware. > >> > >> I'm still not convinced this is a good idea. > > > > You were kind-of a few days back :) > > > > lkml.kernel.org/r/CAL_JsqK-qtAaM_Ou5NtxcWR3F_q=8rmpjum-vqgtkhbtwe5...@mail.gmail.com > > Yeah, well that was before Stephen said anything. > > > So here is the deal: > > > > - I proposed "domain-performance-state" property for this stuff > > initially. > > - But Kevin didn't like that and proposed reusing "opp-hz" and > > "opp-microvolt", which we all agreed to multiple times.. > > - And we are back to the same discussion now and its painful and time > > killing for all of us. > > There's bigger issues than where we put magic values as I raised in > the other patch. > > > TBH, I don't have too strong preferences about any of the suggestions > > you guys have and I need you guys to tell me what binding changes to > > do here and I will do that. > > > >> If you have firmware > >> partially managing things, then I think we should have platform specific > >> bindings or drivers. > > > > What about the initial idea then, like "performance-state" for the > > power domains ? All platforms will anyway replicate that binding only. > > I don't really know. I don't really care either. I'll probably go > along with what everyone agrees to, but the only one I see any > agreement from is Ulf. Also, it is pretty vague as to what platforms > will use this. You claimed you can support QCom scenarios, but there's > really no evidence that that is true. Well, I sent out the code few days back based on these bindings and everyone can see how these bindings will get used now. > What I don't want to see is this > merged and then we need something more yet again in a few months for > another platform. Sure, I get your concerns. So what we need now is: - Stephen to start responding and clarify all the doubts he had as being silent isn't helping. - Or Rajendra to post patches which can prove that this is usable. The last time I had a chat with him, he confirmed that he will post patches after 4.15-rc1 and he should have posted them by now, but he didn't :( -- viresh
Re: [PATCH V8 3/3] OPP: Allow "opp-hz" and "opp-microvolt" to contain magic values
On 27-12-17, 15:54, Rob Herring wrote: > On Wed, Dec 27, 2017 at 2:56 AM, Viresh Kumar wrote: > > On 26-12-17, 14:29, Rob Herring wrote: > >> On Mon, Dec 18, 2017 at 03:51:30PM +0530, Viresh Kumar wrote: > > > >> > +On some platforms the exact frequency or voltage may be hidden from the > >> > OS by > >> > +the firmware and the "opp-hz" or the "opp-microvolt" properties may > >> > contain > >> > +magic values that represent the frequency or voltage in a firmware > >> > dependent > >> > +way, for example an index of an array in the firmware. > >> > >> I'm still not convinced this is a good idea. > > > > You were kind-of a few days back :) > > > > lkml.kernel.org/r/CAL_JsqK-qtAaM_Ou5NtxcWR3F_q=8rmpjum-vqgtkhbtwe5...@mail.gmail.com > > Yeah, well that was before Stephen said anything. > > > So here is the deal: > > > > - I proposed "domain-performance-state" property for this stuff > > initially. > > - But Kevin didn't like that and proposed reusing "opp-hz" and > > "opp-microvolt", which we all agreed to multiple times.. > > - And we are back to the same discussion now and its painful and time > > killing for all of us. > > There's bigger issues than where we put magic values as I raised in > the other patch. > > > TBH, I don't have too strong preferences about any of the suggestions > > you guys have and I need you guys to tell me what binding changes to > > do here and I will do that. > > > >> If you have firmware > >> partially managing things, then I think we should have platform specific > >> bindings or drivers. > > > > What about the initial idea then, like "performance-state" for the > > power domains ? All platforms will anyway replicate that binding only. > > I don't really know. I don't really care either. I'll probably go > along with what everyone agrees to, but the only one I see any > agreement from is Ulf. Also, it is pretty vague as to what platforms > will use this. You claimed you can support QCom scenarios, but there's > really no evidence that that is true. Well, I sent out the code few days back based on these bindings and everyone can see how these bindings will get used now. > What I don't want to see is this > merged and then we need something more yet again in a few months for > another platform. Sure, I get your concerns. So what we need now is: - Stephen to start responding and clarify all the doubts he had as being silent isn't helping. - Or Rajendra to post patches which can prove that this is usable. The last time I had a chat with him, he confirmed that he will post patches after 4.15-rc1 and he should have posted them by now, but he didn't :( -- viresh
Re: [PATCH] Documentation: gpu/i915: rename intel_guc_loader.c
On Wed, Dec 27, 2017 at 08:19:49PM -0800, Randy Dunlap wrote: > https://marc.info/?l=dri-devel=151237442131113=2 > > Already applied to drm/ tree. Gotcha; cheers. -- Joey Pabalinas signature.asc Description: PGP signature
Re: [PATCH] Documentation: gpu/i915: rename intel_guc_loader.c
On Wed, Dec 27, 2017 at 08:19:49PM -0800, Randy Dunlap wrote: > https://marc.info/?l=dri-devel=151237442131113=2 > > Already applied to drm/ tree. Gotcha; cheers. -- Joey Pabalinas signature.asc Description: PGP signature
Re: [PATCH v2] gpio: winbond: add driver
On Wed, Dec 27, 2017 at 12:16:53PM +0100, Linus Walleij wrote: >On Wed, Dec 27, 2017 at 1:24 AM, William Breathitt Gray >wrote: > >> The issue seems to relate to the select GPIOLIB line for the STX104 >> Kconfig option (which has a ISA_BUS_API dependency). Switching GPIOLIB >> to be a dependency, or alternatively selecting ISA_BUS_API, alleviates >> the recursion. >> >> Linus, is my use of select GPIOLIB for the STX104 Kconfig option >> appropriate in this context -- or should it instead be part of the >> depends on line? The STX104 driver includes linux/gpio/driver.h and >> makes use of the devm_gpiochip_add_data function to add support for some >> minor auxililary GPIO lines on the STX104 device. > >In the STX104 case, it seems to be appropriate to >select GPIOLIB, as it is a GPIO provider, not consumer. > >Usually I prefer that drivers just select what they need so I don't >have to run around in the whole kernel tree and turn things on >to the left and right before I can finally select my driver, but >maybe that is just me. > >The other ISA GPIO drivers depends on ISA_BUS_API, I guess >in difference from the symbol GPIOLIB it cannot be universally >selected, so shouldn't this driver also just depends on ISA_BUS_API >and select it from the machine or wherever? When I initially submitted the PC/104 device drivers, I added ISA_BUS_API to their depends on lines in order to mask the respective drivers from users who do not have a PC/104 bus on their system; in retrospect, I shouldn't have done it this way. I later added a proper CONFIG_PC104 option to achieve the masking I initially intended. The correct semantic would be to treat ISA_BUS_API as we do GPIOLIB: allow drivers to select it when needed. The reason is that CONFIG_ISA_BUS_API simply enables the compilation of the drivers/base/isa.c file. This file has no other Kconfig dependencies and simply provides a thin layer of code to eliminate some boilerplate common to ISA-style device drivers; no hardware interactions occur within this code, just pure software abstractions. Given that CONFIG_PC104 now fulfills the original purpose of adding ISA_BUS_API to the depends on line for the PC/104 device drivers, and that ISA_BUS_API can be selected as necessary with no danger to the integrity of the system, I think it would be appropriate to change all the existing ISA_BUS_API depends on lines to respective select lines. Does that logic make sense? William Breathitt Gray > >Yours, >Linus Walleij
Re: [PATCH v2] gpio: winbond: add driver
On Wed, Dec 27, 2017 at 12:16:53PM +0100, Linus Walleij wrote: >On Wed, Dec 27, 2017 at 1:24 AM, William Breathitt Gray > wrote: > >> The issue seems to relate to the select GPIOLIB line for the STX104 >> Kconfig option (which has a ISA_BUS_API dependency). Switching GPIOLIB >> to be a dependency, or alternatively selecting ISA_BUS_API, alleviates >> the recursion. >> >> Linus, is my use of select GPIOLIB for the STX104 Kconfig option >> appropriate in this context -- or should it instead be part of the >> depends on line? The STX104 driver includes linux/gpio/driver.h and >> makes use of the devm_gpiochip_add_data function to add support for some >> minor auxililary GPIO lines on the STX104 device. > >In the STX104 case, it seems to be appropriate to >select GPIOLIB, as it is a GPIO provider, not consumer. > >Usually I prefer that drivers just select what they need so I don't >have to run around in the whole kernel tree and turn things on >to the left and right before I can finally select my driver, but >maybe that is just me. > >The other ISA GPIO drivers depends on ISA_BUS_API, I guess >in difference from the symbol GPIOLIB it cannot be universally >selected, so shouldn't this driver also just depends on ISA_BUS_API >and select it from the machine or wherever? When I initially submitted the PC/104 device drivers, I added ISA_BUS_API to their depends on lines in order to mask the respective drivers from users who do not have a PC/104 bus on their system; in retrospect, I shouldn't have done it this way. I later added a proper CONFIG_PC104 option to achieve the masking I initially intended. The correct semantic would be to treat ISA_BUS_API as we do GPIOLIB: allow drivers to select it when needed. The reason is that CONFIG_ISA_BUS_API simply enables the compilation of the drivers/base/isa.c file. This file has no other Kconfig dependencies and simply provides a thin layer of code to eliminate some boilerplate common to ISA-style device drivers; no hardware interactions occur within this code, just pure software abstractions. Given that CONFIG_PC104 now fulfills the original purpose of adding ISA_BUS_API to the depends on line for the PC/104 device drivers, and that ISA_BUS_API can be selected as necessary with no danger to the integrity of the system, I think it would be appropriate to change all the existing ISA_BUS_API depends on lines to respective select lines. Does that logic make sense? William Breathitt Gray > >Yours, >Linus Walleij
Re: [RFC PATCH bpf-next v2 1/4] tracing/kprobe: bpf: Check error injectable event is on function entry
On 12/27/17 8:16 PM, Steven Rostedt wrote: On Wed, 27 Dec 2017 19:45:42 -0800 Alexei Starovoitovwrote: I don't think that's the case. My reading of current trace_kprobe_ftrace() -> arch_check_ftrace_location() is that it will not be true for old mcount case. In the old mcount case, you can't use ftrace to return without calling the function. That is, no modification of the return ip, unless you created a trampoline that could handle arbitrary stack frames, and remove them from the stack before returning back to the function. correct. I was saying that trace_kprobe_ftrace() won't let us do bpf_override_return with old mcount. As far as the rest of your arguments it very much puzzles me that you claim that this patch suppose to work based on historical reasoning whereas you did NOT test it. I believe that Masami is saying that the modification of the IP from kprobes has been very well tested. But I'm guessing that you still want a test case for using kprobes in this particular instance. It's not the implementation of modifying the IP that you are worried about, but the implementation of BPF using it in this case. Right? exactly. No doubt that old code works. But it doesn't mean that bpf_override_return() will continue to work in kprobes that are not ftrace based. I suspect Josef's existing test case will cover this situation. Probably only special .config is needed to disable ftrace, so "kprobe on entry but not ftrace" check will kick in. But I didn't get an impression that this situation was tested. Instead I see only logical reasoning that it's _supposed_ to work. That's not enough.
Re: [RFC PATCH bpf-next v2 1/4] tracing/kprobe: bpf: Check error injectable event is on function entry
On 12/27/17 8:16 PM, Steven Rostedt wrote: On Wed, 27 Dec 2017 19:45:42 -0800 Alexei Starovoitov wrote: I don't think that's the case. My reading of current trace_kprobe_ftrace() -> arch_check_ftrace_location() is that it will not be true for old mcount case. In the old mcount case, you can't use ftrace to return without calling the function. That is, no modification of the return ip, unless you created a trampoline that could handle arbitrary stack frames, and remove them from the stack before returning back to the function. correct. I was saying that trace_kprobe_ftrace() won't let us do bpf_override_return with old mcount. As far as the rest of your arguments it very much puzzles me that you claim that this patch suppose to work based on historical reasoning whereas you did NOT test it. I believe that Masami is saying that the modification of the IP from kprobes has been very well tested. But I'm guessing that you still want a test case for using kprobes in this particular instance. It's not the implementation of modifying the IP that you are worried about, but the implementation of BPF using it in this case. Right? exactly. No doubt that old code works. But it doesn't mean that bpf_override_return() will continue to work in kprobes that are not ftrace based. I suspect Josef's existing test case will cover this situation. Probably only special .config is needed to disable ftrace, so "kprobe on entry but not ftrace" check will kick in. But I didn't get an impression that this situation was tested. Instead I see only logical reasoning that it's _supposed_ to work. That's not enough.
Re: [RFC V7 2/2] OPP: Allow "opp-hz" and "opp-microvolt" to contain magic values
On 27-12-17, 15:36, Rob Herring wrote: > On Tue, Dec 26, 2017 at 10:45 PM, Viresh Kumar> wrote: > > On 26-12-17, 14:23, Rob Herring wrote: > >> > cpu_opp_table: cpu_opp_table { > >> > compatible = "operating-points-v2"; > >> > opp-shared; > >> > > >> > opp00 { > >> > opp-hz = /bits/ 64 <20800>; > >> > clock-latency-ns = <50>; > >> > power-domain-opp = <_opp_1>; > >> > >> What is this? opp00 here is not a device. One OPP should not point to > >> another. "power-domain-opp" is only supposed to appear in devices > >> alongside power-domains properties. > > > > There are two type of devices: > > > > A.) With fixed performance state requirements and they will have the > > new "required-opp" property in the device node itself as you said. > > > > B.) Devices which can do DVFS (CPU, MMC, LCD, etc) and those may need > > a different performance state of the domain for their individual OPPs > > and so we can't have this property in the device all the time. > > > > Does this make sense ? > > No. From the definition for power-domain-opp > > "+- power-domain-opp: This contains phandle to one of the OPP nodes of > the master > + power domain. This specifies the minimum required OPP of the master > domain for > + the functioning of the device in this OPP (where this property is present). The per-opp thing was mentioned here. > + This property can only be set for a device if the device node contains the > + "power-domains" property. This was trying to say something else, though it wasn't clear and so your concerns. I wanted to say that the device node or its OPP nodes can have the "power-domain-opp" property only if the device node has a "power-domains" property. i.e. you need to have power domain first and then only the power-domain-opp property. > Also, either all or none of the OPP nodes in an OPP > + table should have it set." > > In the above example, you are violating the next to last sentence. > > Though, I'm now confused by what the last sentence means. Yeah, lets leave it as is as the V8 has changed this significantly and you already Acked it :) -- viresh
Re: [RFC V7 2/2] OPP: Allow "opp-hz" and "opp-microvolt" to contain magic values
On 27-12-17, 15:36, Rob Herring wrote: > On Tue, Dec 26, 2017 at 10:45 PM, Viresh Kumar > wrote: > > On 26-12-17, 14:23, Rob Herring wrote: > >> > cpu_opp_table: cpu_opp_table { > >> > compatible = "operating-points-v2"; > >> > opp-shared; > >> > > >> > opp00 { > >> > opp-hz = /bits/ 64 <20800>; > >> > clock-latency-ns = <50>; > >> > power-domain-opp = <_opp_1>; > >> > >> What is this? opp00 here is not a device. One OPP should not point to > >> another. "power-domain-opp" is only supposed to appear in devices > >> alongside power-domains properties. > > > > There are two type of devices: > > > > A.) With fixed performance state requirements and they will have the > > new "required-opp" property in the device node itself as you said. > > > > B.) Devices which can do DVFS (CPU, MMC, LCD, etc) and those may need > > a different performance state of the domain for their individual OPPs > > and so we can't have this property in the device all the time. > > > > Does this make sense ? > > No. From the definition for power-domain-opp > > "+- power-domain-opp: This contains phandle to one of the OPP nodes of > the master > + power domain. This specifies the minimum required OPP of the master > domain for > + the functioning of the device in this OPP (where this property is present). The per-opp thing was mentioned here. > + This property can only be set for a device if the device node contains the > + "power-domains" property. This was trying to say something else, though it wasn't clear and so your concerns. I wanted to say that the device node or its OPP nodes can have the "power-domain-opp" property only if the device node has a "power-domains" property. i.e. you need to have power domain first and then only the power-domain-opp property. > Also, either all or none of the OPP nodes in an OPP > + table should have it set." > > In the above example, you are violating the next to last sentence. > > Though, I'm now confused by what the last sentence means. Yeah, lets leave it as is as the V8 has changed this significantly and you already Acked it :) -- viresh
Re: [PATCH 04/12] ext2: drop unneeded newline
On Wed, Dec 27, 2017 at 03:51:37PM +0100, Julia Lawall wrote: > ext2_msg prints a newline at the end of the message string, so the message > string does not need to include a newline explicitly. Done using > Coccinelle. > > Signed-off-by: Julia LawallReviewed-by: Theodore Ts'o - Ted
Re: [PATCH 04/12] ext2: drop unneeded newline
On Wed, Dec 27, 2017 at 03:51:37PM +0100, Julia Lawall wrote: > ext2_msg prints a newline at the end of the message string, so the message > string does not need to include a newline explicitly. Done using > Coccinelle. > > Signed-off-by: Julia Lawall Reviewed-by: Theodore Ts'o - Ted
Re: [RFC PATCH v11 4/5] PCI / PM: Add support for the PCIe WAKE# signal for OF
* Rafael J. Wysocki[171228 00:51]: > On Wed, Dec 27, 2017 at 4:08 PM, Tony Lindgren wrote: > > * Rafael J. Wysocki [171227 01:00]: > >> On Tuesday, December 26, 2017 2:06:47 AM CET JeffyChen wrote: > >> > Hi Rafael, > >> > > >> > Thanks for your reply :) > >> > > >> > On 12/26/2017 08:11 AM, Rafael J. Wysocki wrote: > >> > >> >+ > >> > >> >+ dn = pci_device_to_OF_node(ppdev); > >> > >> >+ if (!dn) > >> > >> >+ return 0; > >> > >> >+ > >> > >> >+ irq = of_irq_get_byname(dn, "wakeup"); > >> > > Why is this a property of the bridge and not of the device itself? > >> > > >> > That is suggested by Brian, because in that way, the wakeup pin would > >> > not "tied to what exact device is installed (or no device, if it's a > >> > slot)." > >> > >> But I don't think it works when there are two devices using different WAKE# > >> interrupt lines under the same bridge. Or how does it work then? > > > > It won't work currently for multiple devices but adding more than > > one wakeriq per device is doable. And I think we will have other > > cases where multiple wakeirqs are connected to a single device, so > > that issue should be sorted out sooner or later. > > > > And if requesting wakeirq for the PCI WAKE# lines at the PCI > > controller does the job, then maybe that's all we need to start with. > > These are expected to be out-of-band, so not having anything to do > with the Root Complex. > > In-band PME Messages go through the PCIe hierarchy, but that is a > standard mechanism and it is supported already. > > WAKE# are platform-specific, pretty much by definition and I guess > that on most ARM boards they are just going to be some kind of GPIO > pins. OK. So probably supporting the following two configurations should be enough then: 1. One or more WAKE# lines configured as a wakeirq for the PCI controller When the wakeirq calls pm_wakeup_event() for the PCI controller device driver, the PCI controller wakes up and can deal with it's child devices 2. Optionally a WAKE# line from a PCI device configured as wakeirq for the PCI device driver In this case calling the PM runtime resume in the child PCI device will also wake up the parent PCI controller, and then the PCI controller can deal with it's children Seems like this series is pretty close to 1 above except we need to have a list of wakeirqs per device instead of just one. And option 2 should already work as long as the PCI device driver parses and configures the wakeirq. > > Then in addition to that, we could do the following to allow > > PCI devices to request the wakeirq from the PCI controller: > > > > 1. PCI controller or framework implements a chained irq for > >the WAKE# lines assuming it can mask/unmask the WAKE# lines > > > > 2. PCI devices then can just request the wakeirq from the PCI > >controller > > > > And that's about it. Optionally we could leave out the dependency > > to having PCI devices implement PM runtime and just resume the > > parent (PCI controller) if PCI devices has not implemented > > PM runtime. > > So if my understanding is correct, DT should give you the WAKE# IRQ > for the given endpoint PCI device and you only are expected to request > it. The rest should just follow from the other pieces of information > in the DT. Yeah and it seems that we should allow configuring both cases 1 and 2 above. > With the quite obvious caveat that the same IRQ may be used as WAKE# > for multiple endpoint devices (which BTW need not be under the same > bridge even). And with the shared interrupts we can't do the masking/unmasking automatically.. > >> > >> >+ if (irq == -EPROBE_DEFER) > >> > > Braces here, please. > >> > ok, will fix in the next version. > >> > > >> > > > >> > >> >+ return irq; > >> > >> >+ /* Ignore other errors, since a missing wakeup is non-fatal. > >> > >> >*/ > >> > >> >+ else if (irq < 0) { > >> > >> >+ dev_info(>dev, "cannot get wakeup interrupt: > >> > >> >%d\n", irq); > >> > >> >+ return 0; > >> > >> >+ } > >> > >> >+ > >> > >> >+ device_init_wakeup(>dev, true); > >> > > Why do you call this before dev_pm_set_dedicated_wake_irq()? > >> > > >> > hmmm, i thought so too, but it turns out the dedicated wake irq > >> > framework requires device_init_wakeup(dev, true) before attach the wake > >> > irq: > >> > > >> > int device_wakeup_attach_irq(struct device *dev, > >> > struct wake_irq *wakeirq) > >> > { > >> > struct wakeup_source *ws; > >> > > >> > ws = dev->power.wakeup; > >> > if (!ws) { > >> > dev_err(dev, "forgot to call device_init_wakeup?\n"); > >> > return -EINVAL; > >> > > >> > >> Well, that's a framework issue, fair enough. > >> > >> That said, what if user space removes the wakeup source from under you > >>
Re: [RFC PATCH v11 4/5] PCI / PM: Add support for the PCIe WAKE# signal for OF
* Rafael J. Wysocki [171228 00:51]: > On Wed, Dec 27, 2017 at 4:08 PM, Tony Lindgren wrote: > > * Rafael J. Wysocki [171227 01:00]: > >> On Tuesday, December 26, 2017 2:06:47 AM CET JeffyChen wrote: > >> > Hi Rafael, > >> > > >> > Thanks for your reply :) > >> > > >> > On 12/26/2017 08:11 AM, Rafael J. Wysocki wrote: > >> > >> >+ > >> > >> >+ dn = pci_device_to_OF_node(ppdev); > >> > >> >+ if (!dn) > >> > >> >+ return 0; > >> > >> >+ > >> > >> >+ irq = of_irq_get_byname(dn, "wakeup"); > >> > > Why is this a property of the bridge and not of the device itself? > >> > > >> > That is suggested by Brian, because in that way, the wakeup pin would > >> > not "tied to what exact device is installed (or no device, if it's a > >> > slot)." > >> > >> But I don't think it works when there are two devices using different WAKE# > >> interrupt lines under the same bridge. Or how does it work then? > > > > It won't work currently for multiple devices but adding more than > > one wakeriq per device is doable. And I think we will have other > > cases where multiple wakeirqs are connected to a single device, so > > that issue should be sorted out sooner or later. > > > > And if requesting wakeirq for the PCI WAKE# lines at the PCI > > controller does the job, then maybe that's all we need to start with. > > These are expected to be out-of-band, so not having anything to do > with the Root Complex. > > In-band PME Messages go through the PCIe hierarchy, but that is a > standard mechanism and it is supported already. > > WAKE# are platform-specific, pretty much by definition and I guess > that on most ARM boards they are just going to be some kind of GPIO > pins. OK. So probably supporting the following two configurations should be enough then: 1. One or more WAKE# lines configured as a wakeirq for the PCI controller When the wakeirq calls pm_wakeup_event() for the PCI controller device driver, the PCI controller wakes up and can deal with it's child devices 2. Optionally a WAKE# line from a PCI device configured as wakeirq for the PCI device driver In this case calling the PM runtime resume in the child PCI device will also wake up the parent PCI controller, and then the PCI controller can deal with it's children Seems like this series is pretty close to 1 above except we need to have a list of wakeirqs per device instead of just one. And option 2 should already work as long as the PCI device driver parses and configures the wakeirq. > > Then in addition to that, we could do the following to allow > > PCI devices to request the wakeirq from the PCI controller: > > > > 1. PCI controller or framework implements a chained irq for > >the WAKE# lines assuming it can mask/unmask the WAKE# lines > > > > 2. PCI devices then can just request the wakeirq from the PCI > >controller > > > > And that's about it. Optionally we could leave out the dependency > > to having PCI devices implement PM runtime and just resume the > > parent (PCI controller) if PCI devices has not implemented > > PM runtime. > > So if my understanding is correct, DT should give you the WAKE# IRQ > for the given endpoint PCI device and you only are expected to request > it. The rest should just follow from the other pieces of information > in the DT. Yeah and it seems that we should allow configuring both cases 1 and 2 above. > With the quite obvious caveat that the same IRQ may be used as WAKE# > for multiple endpoint devices (which BTW need not be under the same > bridge even). And with the shared interrupts we can't do the masking/unmasking automatically.. > >> > >> >+ if (irq == -EPROBE_DEFER) > >> > > Braces here, please. > >> > ok, will fix in the next version. > >> > > >> > > > >> > >> >+ return irq; > >> > >> >+ /* Ignore other errors, since a missing wakeup is non-fatal. > >> > >> >*/ > >> > >> >+ else if (irq < 0) { > >> > >> >+ dev_info(>dev, "cannot get wakeup interrupt: > >> > >> >%d\n", irq); > >> > >> >+ return 0; > >> > >> >+ } > >> > >> >+ > >> > >> >+ device_init_wakeup(>dev, true); > >> > > Why do you call this before dev_pm_set_dedicated_wake_irq()? > >> > > >> > hmmm, i thought so too, but it turns out the dedicated wake irq > >> > framework requires device_init_wakeup(dev, true) before attach the wake > >> > irq: > >> > > >> > int device_wakeup_attach_irq(struct device *dev, > >> > struct wake_irq *wakeirq) > >> > { > >> > struct wakeup_source *ws; > >> > > >> > ws = dev->power.wakeup; > >> > if (!ws) { > >> > dev_err(dev, "forgot to call device_init_wakeup?\n"); > >> > return -EINVAL; > >> > > >> > >> Well, that's a framework issue, fair enough. > >> > >> That said, what if user space removes the wakeup source from under you > >> concurrently via sysfs? Tony? > > > > Hmm sounds racy, need to
Re: [PATCH] Documentation: gpu/i915: rename intel_guc_loader.c
On 12/27/2017 07:28 PM, Joey Pabalinas wrote: > Commit e8668bbcb0f91c7baa ("drm/i915/guc: Rename intel_guc_loader.c to > intel_guc_fw.c") renamed drivers/gpu/drm/i915/intel_guc_loader.c > but didn't update Documentation/gpu/i915.rst. Change intel_guc_loader.c > to intel_guc_fw.c in Documentation/gpu/i915.rst. > > Signed-off-by: Joey Pabalinashttps://marc.info/?l=dri-devel=151237442131113=2 Already applied to drm/ tree. > --- > Documentation/gpu/i915.rst | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst > index 2e7ee0313c1cd6c037..e21698e16534de9ccc 100644 > --- a/Documentation/gpu/i915.rst > +++ b/Documentation/gpu/i915.rst > @@ -341,10 +341,10 @@ GuC > GuC-specific firmware loader > > > -.. kernel-doc:: drivers/gpu/drm/i915/intel_guc_loader.c > +.. kernel-doc:: drivers/gpu/drm/i915/intel_guc_fw.c > :doc: GuC-specific firmware loader > > -.. kernel-doc:: drivers/gpu/drm/i915/intel_guc_loader.c > +.. kernel-doc:: drivers/gpu/drm/i915/intel_guc_fw.c > :internal: > > GuC-based command submission > -- ~Randy
Re: [PATCH v3 bpf-next 2/2] tools/bpftool: fix bpftool build with bintutils >= 2.9
On Wed, 27 Dec 2017 19:16:29 +, Roman Gushchin wrote: > Bpftool build is broken with binutils version 2.29 and later. > The cause is commit 003ca0fd2286 ("Refactor disassembler selection") > in the binutils repo, which changed the disassembler() function > signature. > > Fix this by adding a new "feature" to the tools/build/features > infrastructure and make it responsible for decision which > disassembler() function signature to use. > > Signed-off-by: Roman Gushchin> Cc: Jakub Kicinski > Cc: Alexei Starovoitov > Cc: Daniel Borkmann Acked-by: Jakub Kicinski
Re: [PATCH] Documentation: gpu/i915: rename intel_guc_loader.c
On 12/27/2017 07:28 PM, Joey Pabalinas wrote: > Commit e8668bbcb0f91c7baa ("drm/i915/guc: Rename intel_guc_loader.c to > intel_guc_fw.c") renamed drivers/gpu/drm/i915/intel_guc_loader.c > but didn't update Documentation/gpu/i915.rst. Change intel_guc_loader.c > to intel_guc_fw.c in Documentation/gpu/i915.rst. > > Signed-off-by: Joey Pabalinas https://marc.info/?l=dri-devel=151237442131113=2 Already applied to drm/ tree. > --- > Documentation/gpu/i915.rst | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst > index 2e7ee0313c1cd6c037..e21698e16534de9ccc 100644 > --- a/Documentation/gpu/i915.rst > +++ b/Documentation/gpu/i915.rst > @@ -341,10 +341,10 @@ GuC > GuC-specific firmware loader > > > -.. kernel-doc:: drivers/gpu/drm/i915/intel_guc_loader.c > +.. kernel-doc:: drivers/gpu/drm/i915/intel_guc_fw.c > :doc: GuC-specific firmware loader > > -.. kernel-doc:: drivers/gpu/drm/i915/intel_guc_loader.c > +.. kernel-doc:: drivers/gpu/drm/i915/intel_guc_fw.c > :internal: > > GuC-based command submission > -- ~Randy
Re: [PATCH v3 bpf-next 2/2] tools/bpftool: fix bpftool build with bintutils >= 2.9
On Wed, 27 Dec 2017 19:16:29 +, Roman Gushchin wrote: > Bpftool build is broken with binutils version 2.29 and later. > The cause is commit 003ca0fd2286 ("Refactor disassembler selection") > in the binutils repo, which changed the disassembler() function > signature. > > Fix this by adding a new "feature" to the tools/build/features > infrastructure and make it responsible for decision which > disassembler() function signature to use. > > Signed-off-by: Roman Gushchin > Cc: Jakub Kicinski > Cc: Alexei Starovoitov > Cc: Daniel Borkmann Acked-by: Jakub Kicinski
Re: [RFC PATCH bpf-next v2 1/4] tracing/kprobe: bpf: Check error injectable event is on function entry
On Wed, 27 Dec 2017 19:45:42 -0800 Alexei Starovoitovwrote: > I don't think that's the case. My reading of current > trace_kprobe_ftrace() -> arch_check_ftrace_location() > is that it will not be true for old mcount case. In the old mcount case, you can't use ftrace to return without calling the function. That is, no modification of the return ip, unless you created a trampoline that could handle arbitrary stack frames, and remove them from the stack before returning back to the function. > > As far as the rest of your arguments it very much puzzles me that > you claim that this patch suppose to work based on historical > reasoning whereas you did NOT test it. I believe that Masami is saying that the modification of the IP from kprobes has been very well tested. But I'm guessing that you still want a test case for using kprobes in this particular instance. It's not the implementation of modifying the IP that you are worried about, but the implementation of BPF using it in this case. Right? -- Steve
Re: [RFC PATCH bpf-next v2 1/4] tracing/kprobe: bpf: Check error injectable event is on function entry
On Wed, 27 Dec 2017 19:45:42 -0800 Alexei Starovoitov wrote: > I don't think that's the case. My reading of current > trace_kprobe_ftrace() -> arch_check_ftrace_location() > is that it will not be true for old mcount case. In the old mcount case, you can't use ftrace to return without calling the function. That is, no modification of the return ip, unless you created a trampoline that could handle arbitrary stack frames, and remove them from the stack before returning back to the function. > > As far as the rest of your arguments it very much puzzles me that > you claim that this patch suppose to work based on historical > reasoning whereas you did NOT test it. I believe that Masami is saying that the modification of the IP from kprobes has been very well tested. But I'm guessing that you still want a test case for using kprobes in this particular instance. It's not the implementation of modifying the IP that you are worried about, but the implementation of BPF using it in this case. Right? -- Steve
Re: [Ocfs2-devel] [PATCH] ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGE
Hi, On 12/27/2017 05:29 PM, Gang He wrote: If we can't get inode lock immediately in the function ocfs2_inode_lock_with_page() when reading a page, we should not return directly here, since this will lead to a softlockup problem. The method is to get a blocking lock and immediately unlock before returning, this can avoid CPU resource waste due to lots of retries, and benefits fairness in getting lock among multiple nodes, increase efficiency in case modifying the same file frequently from multiple nodes. The softlockup problem looks like, Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 885 Comm: multi_mmap Tainted: G L 4.12.14-6.1-default #1 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Call Trace: dump_stack+0x5c/0x82 panic+0xd5/0x21e watchdog_timer_fn+0x208/0x210 ? watchdog_park_threads+0x70/0x70 __hrtimer_run_queues+0xcc/0x200 hrtimer_interrupt+0xa6/0x1f0 smp_apic_timer_interrupt+0x34/0x50 apic_timer_interrupt+0x96/0xa0 RIP: 0010:unlock_page+0x17/0x30 RSP: :af154080bc88 EFLAGS: 0246 ORIG_RAX: ff10 RAX: dead0100 RBX: f21e009f5300 RCX: 0004 RDX: dead00ff RSI: 0202 RDI: f21e009f5300 RBP: R08: R09: af154080bb00 R10: af154080bc30 R11: 0040 R12: 993749a39518 R13: R14: f21e009f5300 R15: f21e009f5300 ocfs2_inode_lock_with_page+0x25/0x30 [ocfs2] ocfs2_readpage+0x41/0x2d0 [ocfs2] ? pagecache_get_page+0x30/0x200 filemap_fault+0x12b/0x5c0 ? recalc_sigpending+0x17/0x50 ? __set_task_blocked+0x28/0x70 ? __set_current_blocked+0x3d/0x60 ocfs2_fault+0x29/0xb0 [ocfs2] __do_fault+0x1a/0xa0 __handle_mm_fault+0xbe8/0x1090 handle_mm_fault+0xaa/0x1f0 __do_page_fault+0x235/0x4b0 trace_do_page_fault+0x3c/0x110 async_page_fault+0x28/0x30 RIP: 0033:0x7fa75ded638e RSP: 002b:7ffd6657db18 EFLAGS: 00010287 RAX: 55c7662fb700 RBX: 0001 RCX: 55c7662fb700 RDX: 1770 RSI: 7fa75e909000 RDI: 55c7662fb700 RBP: 0003 R08: 000e R09: R10: 0483 R11: 7fa75ded61b0 R12: 7fa75e90a770 R13: 000e R14: 1770 R15: Fixes: 1cce4df04f37 ("ocfs2: do not lock/unlock() inode DLM lock") Signed-off-by: Gang HeOn most linux server, CONFIG_PREEMPT is not set for better system-wide throughtput. The long-time retry logic for getting page lock and inode lock can easily cause softlock, resulting in real time task like corosync when using pcmk stack cannot be scheduled on time. When multiple nodes concurrently write the same file, the performance cannot be good anyway, and it's also less possibility. The trick for avoiding the busy loop looks good to me. Reviewed-by: z...@suse.com Thanks, Eric --- fs/ocfs2/dlmglue.c | 9 + 1 file changed, 9 insertions(+) diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c index 4689940..5193218 100644 --- a/fs/ocfs2/dlmglue.c +++ b/fs/ocfs2/dlmglue.c @@ -2486,6 +2486,15 @@ int ocfs2_inode_lock_with_page(struct inode *inode, ret = ocfs2_inode_lock_full(inode, ret_bh, ex, OCFS2_LOCK_NONBLOCK); if (ret == -EAGAIN) { unlock_page(page); + /* +* If we can't get inode lock immediately, we should not return +* directly here, since this will lead to a softlockup problem. +* The method is to get a blocking lock and immediately unlock +* before returning, this can avoid CPU resource waste due to +* lots of retries, and benefits fairness in getting lock. +*/ + if (ocfs2_inode_lock(inode, ret_bh, ex) == 0) + ocfs2_inode_unlock(inode, ex); ret = AOP_TRUNCATED_PAGE; }
Re: [Ocfs2-devel] [PATCH] ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGE
Hi, On 12/27/2017 05:29 PM, Gang He wrote: If we can't get inode lock immediately in the function ocfs2_inode_lock_with_page() when reading a page, we should not return directly here, since this will lead to a softlockup problem. The method is to get a blocking lock and immediately unlock before returning, this can avoid CPU resource waste due to lots of retries, and benefits fairness in getting lock among multiple nodes, increase efficiency in case modifying the same file frequently from multiple nodes. The softlockup problem looks like, Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 885 Comm: multi_mmap Tainted: G L 4.12.14-6.1-default #1 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Call Trace: dump_stack+0x5c/0x82 panic+0xd5/0x21e watchdog_timer_fn+0x208/0x210 ? watchdog_park_threads+0x70/0x70 __hrtimer_run_queues+0xcc/0x200 hrtimer_interrupt+0xa6/0x1f0 smp_apic_timer_interrupt+0x34/0x50 apic_timer_interrupt+0x96/0xa0 RIP: 0010:unlock_page+0x17/0x30 RSP: :af154080bc88 EFLAGS: 0246 ORIG_RAX: ff10 RAX: dead0100 RBX: f21e009f5300 RCX: 0004 RDX: dead00ff RSI: 0202 RDI: f21e009f5300 RBP: R08: R09: af154080bb00 R10: af154080bc30 R11: 0040 R12: 993749a39518 R13: R14: f21e009f5300 R15: f21e009f5300 ocfs2_inode_lock_with_page+0x25/0x30 [ocfs2] ocfs2_readpage+0x41/0x2d0 [ocfs2] ? pagecache_get_page+0x30/0x200 filemap_fault+0x12b/0x5c0 ? recalc_sigpending+0x17/0x50 ? __set_task_blocked+0x28/0x70 ? __set_current_blocked+0x3d/0x60 ocfs2_fault+0x29/0xb0 [ocfs2] __do_fault+0x1a/0xa0 __handle_mm_fault+0xbe8/0x1090 handle_mm_fault+0xaa/0x1f0 __do_page_fault+0x235/0x4b0 trace_do_page_fault+0x3c/0x110 async_page_fault+0x28/0x30 RIP: 0033:0x7fa75ded638e RSP: 002b:7ffd6657db18 EFLAGS: 00010287 RAX: 55c7662fb700 RBX: 0001 RCX: 55c7662fb700 RDX: 1770 RSI: 7fa75e909000 RDI: 55c7662fb700 RBP: 0003 R08: 000e R09: R10: 0483 R11: 7fa75ded61b0 R12: 7fa75e90a770 R13: 000e R14: 1770 R15: Fixes: 1cce4df04f37 ("ocfs2: do not lock/unlock() inode DLM lock") Signed-off-by: Gang He On most linux server, CONFIG_PREEMPT is not set for better system-wide throughtput. The long-time retry logic for getting page lock and inode lock can easily cause softlock, resulting in real time task like corosync when using pcmk stack cannot be scheduled on time. When multiple nodes concurrently write the same file, the performance cannot be good anyway, and it's also less possibility. The trick for avoiding the busy loop looks good to me. Reviewed-by: z...@suse.com Thanks, Eric --- fs/ocfs2/dlmglue.c | 9 + 1 file changed, 9 insertions(+) diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c index 4689940..5193218 100644 --- a/fs/ocfs2/dlmglue.c +++ b/fs/ocfs2/dlmglue.c @@ -2486,6 +2486,15 @@ int ocfs2_inode_lock_with_page(struct inode *inode, ret = ocfs2_inode_lock_full(inode, ret_bh, ex, OCFS2_LOCK_NONBLOCK); if (ret == -EAGAIN) { unlock_page(page); + /* +* If we can't get inode lock immediately, we should not return +* directly here, since this will lead to a softlockup problem. +* The method is to get a blocking lock and immediately unlock +* before returning, this can avoid CPU resource waste due to +* lots of retries, and benefits fairness in getting lock. +*/ + if (ocfs2_inode_lock(inode, ret_bh, ex) == 0) + ocfs2_inode_unlock(inode, ex); ret = AOP_TRUNCATED_PAGE; }
Re: [RFC PATCH bpf-next v2 4/4] error-injection: Support fault injection framework
On 12/27/17 5:38 PM, Masami Hiramatsu wrote: On Wed, 27 Dec 2017 14:49:46 -0800 Alexei Starovoitovwrote: On 12/27/17 12:09 AM, Masami Hiramatsu wrote: On Tue, 26 Dec 2017 18:12:56 -0800 Alexei Starovoitov wrote: On Tue, Dec 26, 2017 at 04:48:25PM +0900, Masami Hiramatsu wrote: Support in-kernel fault-injection framework via debugfs. This allows you to inject a conditional error to specified function using debugfs interfaces. Signed-off-by: Masami Hiramatsu --- Documentation/fault-injection/fault-injection.txt |5 + kernel/Makefile |1 kernel/fail_function.c| 169 + lib/Kconfig.debug | 10 + 4 files changed, 185 insertions(+) create mode 100644 kernel/fail_function.c diff --git a/Documentation/fault-injection/fault-injection.txt b/Documentation/fault-injection/fault-injection.txt index 918972babcd8..6243a588dd71 100644 --- a/Documentation/fault-injection/fault-injection.txt +++ b/Documentation/fault-injection/fault-injection.txt @@ -30,6 +30,11 @@ o fail_mmc_request injects MMC data errors on devices permitted by setting debugfs entries under /sys/kernel/debug/mmc0/fail_mmc_request +o fail_function + + injects error return on specific functions by setting debugfs entries + under /sys/kernel/debug/fail_function. No boot option supported. I like it. Could you document it a bit better? Yes, I will do in next series. In particular retval is configurable, but without an example no one will be able to figure out how to use it. Ah, right. BTW, as I pointed in the covermail, should we store the expected error value range into the injectable list? e.g. ALLOW_ERROR_INJECTION(open_ctree, -1, -MAX_ERRNO) And provide APIs to check/get it. I'm afraid such check would be too costly. Right now we have only two functions marked but I expect hundreds more will be added in the near future as soon as developers realize the potential of such error injection. All of ALLOW_ERROR_INJECTION marks add 8 byte overhead each to .data. Multiple by 1k and we have 8k of data spent on marks. If we add max/min range marks that doubles it for very little use. I think marking function only is enough. Sorry, I don't think so. Even if it takes 16 bytes more for each points, I don't think it is any overhead for machines in these days. Even if so, we can provide a kconfig to reduce it. I mean, we are living in GB-order memory are, and it will be bigger in the future. Why we have to worry about hundreds of 16bytes memory pieces? It will take a few KB, and even if we mark thousands of functions, it never reaches 1MB, in GB memory pool. :) Of course, for many small-footprint embedded devices (like having less than 128MB memory), this feature can be a overhead. But they can cut off the table by kconfig. I still disagree on wasting 16-byte * num_of_funcs of .data here. The trade-off of usability vs memory just not worth it. Sorry. Please focus on testing your changes instead.
Re: [RFC PATCH bpf-next v2 4/4] error-injection: Support fault injection framework
On 12/27/17 5:38 PM, Masami Hiramatsu wrote: On Wed, 27 Dec 2017 14:49:46 -0800 Alexei Starovoitov wrote: On 12/27/17 12:09 AM, Masami Hiramatsu wrote: On Tue, 26 Dec 2017 18:12:56 -0800 Alexei Starovoitov wrote: On Tue, Dec 26, 2017 at 04:48:25PM +0900, Masami Hiramatsu wrote: Support in-kernel fault-injection framework via debugfs. This allows you to inject a conditional error to specified function using debugfs interfaces. Signed-off-by: Masami Hiramatsu --- Documentation/fault-injection/fault-injection.txt |5 + kernel/Makefile |1 kernel/fail_function.c| 169 + lib/Kconfig.debug | 10 + 4 files changed, 185 insertions(+) create mode 100644 kernel/fail_function.c diff --git a/Documentation/fault-injection/fault-injection.txt b/Documentation/fault-injection/fault-injection.txt index 918972babcd8..6243a588dd71 100644 --- a/Documentation/fault-injection/fault-injection.txt +++ b/Documentation/fault-injection/fault-injection.txt @@ -30,6 +30,11 @@ o fail_mmc_request injects MMC data errors on devices permitted by setting debugfs entries under /sys/kernel/debug/mmc0/fail_mmc_request +o fail_function + + injects error return on specific functions by setting debugfs entries + under /sys/kernel/debug/fail_function. No boot option supported. I like it. Could you document it a bit better? Yes, I will do in next series. In particular retval is configurable, but without an example no one will be able to figure out how to use it. Ah, right. BTW, as I pointed in the covermail, should we store the expected error value range into the injectable list? e.g. ALLOW_ERROR_INJECTION(open_ctree, -1, -MAX_ERRNO) And provide APIs to check/get it. I'm afraid such check would be too costly. Right now we have only two functions marked but I expect hundreds more will be added in the near future as soon as developers realize the potential of such error injection. All of ALLOW_ERROR_INJECTION marks add 8 byte overhead each to .data. Multiple by 1k and we have 8k of data spent on marks. If we add max/min range marks that doubles it for very little use. I think marking function only is enough. Sorry, I don't think so. Even if it takes 16 bytes more for each points, I don't think it is any overhead for machines in these days. Even if so, we can provide a kconfig to reduce it. I mean, we are living in GB-order memory are, and it will be bigger in the future. Why we have to worry about hundreds of 16bytes memory pieces? It will take a few KB, and even if we mark thousands of functions, it never reaches 1MB, in GB memory pool. :) Of course, for many small-footprint embedded devices (like having less than 128MB memory), this feature can be a overhead. But they can cut off the table by kconfig. I still disagree on wasting 16-byte * num_of_funcs of .data here. The trade-off of usability vs memory just not worth it. Sorry. Please focus on testing your changes instead.
Re: [RFC PATCH bpf-next v2 1/4] tracing/kprobe: bpf: Check error injectable event is on function entry
On 12/27/17 6:34 PM, Masami Hiramatsu wrote: On Wed, 27 Dec 2017 14:46:24 -0800 Alexei Starovoitovwrote: On 12/26/17 9:56 PM, Masami Hiramatsu wrote: On Tue, 26 Dec 2017 17:57:32 -0800 Alexei Starovoitov wrote: On Tue, Dec 26, 2017 at 04:46:59PM +0900, Masami Hiramatsu wrote: Check whether error injectable event is on function entry or not. Currently it checks the event is ftrace-based kprobes or not, but that is wrong. It should check if the event is on the entry of target function. Since error injection will override a function to just return with modified return value, that operation must be done before the target function starts making stackframe. As a side effect, bpf error injection is no need to depend on function-tracer. It can work with sw-breakpoint based kprobe events too. Signed-off-by: Masami Hiramatsu --- kernel/trace/Kconfig|2 -- kernel/trace/bpf_trace.c|6 +++--- kernel/trace/trace_kprobe.c |8 +--- kernel/trace/trace_probe.h | 12 ++-- 4 files changed, 14 insertions(+), 14 deletions(-) diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index ae3a2d519e50..6400e1bf97c5 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -533,9 +533,7 @@ config FUNCTION_PROFILER config BPF_KPROBE_OVERRIDE bool "Enable BPF programs to override a kprobed function" depends on BPF_EVENTS - depends on KPROBES_ON_FTRACE depends on HAVE_KPROBE_OVERRIDE - depends on DYNAMIC_FTRACE_WITH_REGS default n help Allows BPF to override the execution of a probed function and diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index f6d2327ecb59..d663660f8392 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -800,11 +800,11 @@ int perf_event_attach_bpf_prog(struct perf_event *event, int ret = -EEXIST; /* -* Kprobe override only works for ftrace based kprobes, and only if they -* are on the opt-in list. +* Kprobe override only works if they are on the function entry, +* and only if they are on the opt-in list. */ if (prog->kprobe_override && - (!trace_kprobe_ftrace(event->tp_event) || + (!trace_kprobe_on_func_entry(event->tp_event) || !trace_kprobe_error_injectable(event->tp_event))) return -EINVAL; diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index 91f4b57dab82..265e3e27e8dc 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -88,13 +88,15 @@ static nokprobe_inline unsigned long trace_kprobe_nhit(struct trace_kprobe *tk) return nhit; } -int trace_kprobe_ftrace(struct trace_event_call *call) +bool trace_kprobe_on_func_entry(struct trace_event_call *call) { struct trace_kprobe *tk = (struct trace_kprobe *)call->data; - return kprobe_ftrace(>rp.kp); + + return kprobe_on_func_entry(tk->rp.kp.addr, tk->rp.kp.symbol_name, + tk->rp.kp.offset); That would be nice, but did you test this? Yes, because the jprobe, which was only official user of modifying execution path using kprobe, did same way to check. (and kretprobe also does it) My understanding that kprobe will restore all regs and here we need to override return ip _and_ value. yes, no problem. kprobe restore all regs from pt_regs, including regs->ip. Could you add a patch with the test the way Josef did or describe the steps to test this new mode? Would you mean below patch? If so, it should work without any change. [PATCH v10 4/5] samples/bpf: add a test for bpf_override_return yeah. I expect bpf_override_return test to work as-is. I'm asking for the test for new functionality added by this patch. In particular kprobe on func entry without ftrace. How did you test it? This function is used in kretprobe and jprobe. Jprobe was the user of "modifying instruction pointer to another function" in kprobes. If it doesn't work, jprobe also doesn't work, this means you can not modify IP by kprobes anymore. Anyway, until linux-4.13, that was well tested by kprobe smoke test. and how I can repeat the test? I'm still not sure that it works correctly. That works correctly because it checks given address is on the entry point (the 1st instruction) of a function, using kallsyms. The reason why I made another flag for ftrace was, there are 2 modes for ftrace dynamic instrumentation, fentry and mcount. With new fentry mode, ftrace will be put on the first instruction of the function, so it will work as you expected. With traditional gcc mcount, ftrace will be called after making call frame for _mcount(). This means if you modify ip, it will not work or cause a trouble because _mcount call frame is still on the stack. So, current ftrace-based checker doesn't work, it depends on the case. Of course, in most
Re: [RFC PATCH bpf-next v2 1/4] tracing/kprobe: bpf: Check error injectable event is on function entry
On 12/27/17 6:34 PM, Masami Hiramatsu wrote: On Wed, 27 Dec 2017 14:46:24 -0800 Alexei Starovoitov wrote: On 12/26/17 9:56 PM, Masami Hiramatsu wrote: On Tue, 26 Dec 2017 17:57:32 -0800 Alexei Starovoitov wrote: On Tue, Dec 26, 2017 at 04:46:59PM +0900, Masami Hiramatsu wrote: Check whether error injectable event is on function entry or not. Currently it checks the event is ftrace-based kprobes or not, but that is wrong. It should check if the event is on the entry of target function. Since error injection will override a function to just return with modified return value, that operation must be done before the target function starts making stackframe. As a side effect, bpf error injection is no need to depend on function-tracer. It can work with sw-breakpoint based kprobe events too. Signed-off-by: Masami Hiramatsu --- kernel/trace/Kconfig|2 -- kernel/trace/bpf_trace.c|6 +++--- kernel/trace/trace_kprobe.c |8 +--- kernel/trace/trace_probe.h | 12 ++-- 4 files changed, 14 insertions(+), 14 deletions(-) diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index ae3a2d519e50..6400e1bf97c5 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -533,9 +533,7 @@ config FUNCTION_PROFILER config BPF_KPROBE_OVERRIDE bool "Enable BPF programs to override a kprobed function" depends on BPF_EVENTS - depends on KPROBES_ON_FTRACE depends on HAVE_KPROBE_OVERRIDE - depends on DYNAMIC_FTRACE_WITH_REGS default n help Allows BPF to override the execution of a probed function and diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index f6d2327ecb59..d663660f8392 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -800,11 +800,11 @@ int perf_event_attach_bpf_prog(struct perf_event *event, int ret = -EEXIST; /* -* Kprobe override only works for ftrace based kprobes, and only if they -* are on the opt-in list. +* Kprobe override only works if they are on the function entry, +* and only if they are on the opt-in list. */ if (prog->kprobe_override && - (!trace_kprobe_ftrace(event->tp_event) || + (!trace_kprobe_on_func_entry(event->tp_event) || !trace_kprobe_error_injectable(event->tp_event))) return -EINVAL; diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index 91f4b57dab82..265e3e27e8dc 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -88,13 +88,15 @@ static nokprobe_inline unsigned long trace_kprobe_nhit(struct trace_kprobe *tk) return nhit; } -int trace_kprobe_ftrace(struct trace_event_call *call) +bool trace_kprobe_on_func_entry(struct trace_event_call *call) { struct trace_kprobe *tk = (struct trace_kprobe *)call->data; - return kprobe_ftrace(>rp.kp); + + return kprobe_on_func_entry(tk->rp.kp.addr, tk->rp.kp.symbol_name, + tk->rp.kp.offset); That would be nice, but did you test this? Yes, because the jprobe, which was only official user of modifying execution path using kprobe, did same way to check. (and kretprobe also does it) My understanding that kprobe will restore all regs and here we need to override return ip _and_ value. yes, no problem. kprobe restore all regs from pt_regs, including regs->ip. Could you add a patch with the test the way Josef did or describe the steps to test this new mode? Would you mean below patch? If so, it should work without any change. [PATCH v10 4/5] samples/bpf: add a test for bpf_override_return yeah. I expect bpf_override_return test to work as-is. I'm asking for the test for new functionality added by this patch. In particular kprobe on func entry without ftrace. How did you test it? This function is used in kretprobe and jprobe. Jprobe was the user of "modifying instruction pointer to another function" in kprobes. If it doesn't work, jprobe also doesn't work, this means you can not modify IP by kprobes anymore. Anyway, until linux-4.13, that was well tested by kprobe smoke test. and how I can repeat the test? I'm still not sure that it works correctly. That works correctly because it checks given address is on the entry point (the 1st instruction) of a function, using kallsyms. The reason why I made another flag for ftrace was, there are 2 modes for ftrace dynamic instrumentation, fentry and mcount. With new fentry mode, ftrace will be put on the first instruction of the function, so it will work as you expected. With traditional gcc mcount, ftrace will be called after making call frame for _mcount(). This means if you modify ip, it will not work or cause a trouble because _mcount call frame is still on the stack. So, current ftrace-based checker doesn't work, it depends on the case. Of course, in most case, kernel will be build in new gcc which supports fentry, but
Re: [PATCH] objtool: Fix clang enum conversion warning
On Wed, Dec 27, 2017 at 12:38 PM, Josh Poimboeufwrote: > On Wed, Dec 27, 2017 at 01:34:34PM +0100, Lukas Bulwahn wrote: >> Assuming that the authorship of this one-line change does not matter, as it >> is largely suggested by the clang compiler anyway, and we want to move the >> change forward, we should decide on which of three patches to move >> forward. I can give my Reviewed-by and Tested-by to any of them. I suppose Ingo would take the first and accumulate Reviewed-By tags. I don't particularly care about authorship (please just fix the bug). > The patch from Lukas was the first one I received, so that's the one I > used. I rewrote the commit msg for clarity and added my SOB and sent it > to Ingo for merging. I think you should have kept Nicholas Mc Guire's Reviewed by tag? Maybe Ingo can re-add his and mine when merging?
Re: [PATCH] objtool: Fix clang enum conversion warning
On Wed, Dec 27, 2017 at 12:38 PM, Josh Poimboeuf wrote: > On Wed, Dec 27, 2017 at 01:34:34PM +0100, Lukas Bulwahn wrote: >> Assuming that the authorship of this one-line change does not matter, as it >> is largely suggested by the clang compiler anyway, and we want to move the >> change forward, we should decide on which of three patches to move >> forward. I can give my Reviewed-by and Tested-by to any of them. I suppose Ingo would take the first and accumulate Reviewed-By tags. I don't particularly care about authorship (please just fix the bug). > The patch from Lukas was the first one I received, so that's the one I > used. I rewrote the commit msg for clarity and added my SOB and sent it > to Ingo for merging. I think you should have kept Nicholas Mc Guire's Reviewed by tag? Maybe Ingo can re-add his and mine when merging?
[PATCH -next] xen/pvcalls: use GFP_ATOMIC under spin lock
A spin lock is taken here so we should use GFP_ATOMIC. Fixes: 9774c6cca266 ("xen/pvcalls: implement accept command") Signed-off-by: Wei Yongjun--- drivers/xen/pvcalls-front.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c index 0c1ec68..dfd00d9 100644 --- a/drivers/xen/pvcalls-front.c +++ b/drivers/xen/pvcalls-front.c @@ -807,7 +807,7 @@ int pvcalls_front_accept(struct socket *sock, struct socket *newsock, int flags) pvcalls_exit(); return ret; } - map2 = kzalloc(sizeof(*map2), GFP_KERNEL); + map2 = kzalloc(sizeof(*map2), GFP_ATOMIC); if (map2 == NULL) { clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT, (void *)>passive.flags);
[PATCH -next] xen/pvcalls: use GFP_ATOMIC under spin lock
A spin lock is taken here so we should use GFP_ATOMIC. Fixes: 9774c6cca266 ("xen/pvcalls: implement accept command") Signed-off-by: Wei Yongjun --- drivers/xen/pvcalls-front.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c index 0c1ec68..dfd00d9 100644 --- a/drivers/xen/pvcalls-front.c +++ b/drivers/xen/pvcalls-front.c @@ -807,7 +807,7 @@ int pvcalls_front_accept(struct socket *sock, struct socket *newsock, int flags) pvcalls_exit(); return ret; } - map2 = kzalloc(sizeof(*map2), GFP_KERNEL); + map2 = kzalloc(sizeof(*map2), GFP_ATOMIC); if (map2 == NULL) { clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT, (void *)>passive.flags);