Re: [RFC PATCH bpf-next v2 1/4] tracing/kprobe: bpf: Check error injectable event is on function entry

2017-12-27 Thread Masami Hiramatsu
On Wed, 27 Dec 2017 19:45:42 -0800
Alexei Starovoitov  wrote:

> On 12/27/17 6:34 PM, Masami Hiramatsu wrote:
> > On Wed, 27 Dec 2017 14:46:24 -0800
> > Alexei Starovoitov  wrote:
> >
> >> On 12/26/17 9:56 PM, Masami Hiramatsu wrote:
> >>> On Tue, 26 Dec 2017 17:57:32 -0800
> >>> Alexei Starovoitov  wrote:
> >>>
>  On Tue, Dec 26, 2017 at 04:46:59PM +0900, Masami Hiramatsu wrote:
> > Check whether error injectable event is on function entry or not.
> > Currently it checks the event is ftrace-based kprobes or not,
> > but that is wrong. It should check if the event is on the entry
> > of target function. Since error injection will override a function
> > to just return with modified return value, that operation must
> > be done before the target function starts making stackframe.
> >
> > As a side effect, bpf error injection is no need to depend on
> > function-tracer. It can work with sw-breakpoint based kprobe
> > events too.
> >
> > Signed-off-by: Masami Hiramatsu 
> > ---
> >  kernel/trace/Kconfig|2 --
> >  kernel/trace/bpf_trace.c|6 +++---
> >  kernel/trace/trace_kprobe.c |8 +---
> >  kernel/trace/trace_probe.h  |   12 ++--
> >  4 files changed, 14 insertions(+), 14 deletions(-)
> >
> > diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> > index ae3a2d519e50..6400e1bf97c5 100644
> > --- a/kernel/trace/Kconfig
> > +++ b/kernel/trace/Kconfig
> > @@ -533,9 +533,7 @@ config FUNCTION_PROFILER
> >  config BPF_KPROBE_OVERRIDE
> > bool "Enable BPF programs to override a kprobed function"
> > depends on BPF_EVENTS
> > -   depends on KPROBES_ON_FTRACE
> > depends on HAVE_KPROBE_OVERRIDE
> > -   depends on DYNAMIC_FTRACE_WITH_REGS
> > default n
> > help
> >  Allows BPF to override the execution of a probed function and
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index f6d2327ecb59..d663660f8392 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -800,11 +800,11 @@ int perf_event_attach_bpf_prog(struct perf_event 
> > *event,
> > int ret = -EEXIST;
> >
> > /*
> > -* Kprobe override only works for ftrace based kprobes, and 
> > only if they
> > -* are on the opt-in list.
> > +* Kprobe override only works if they are on the function entry,
> > +* and only if they are on the opt-in list.
> >  */
> > if (prog->kprobe_override &&
> > -   (!trace_kprobe_ftrace(event->tp_event) ||
> > +   (!trace_kprobe_on_func_entry(event->tp_event) ||
> >  !trace_kprobe_error_injectable(event->tp_event)))
> > return -EINVAL;
> >
> > diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> > index 91f4b57dab82..265e3e27e8dc 100644
> > --- a/kernel/trace/trace_kprobe.c
> > +++ b/kernel/trace/trace_kprobe.c
> > @@ -88,13 +88,15 @@ static nokprobe_inline unsigned long 
> > trace_kprobe_nhit(struct trace_kprobe *tk)
> > return nhit;
> >  }
> >
> > -int trace_kprobe_ftrace(struct trace_event_call *call)
> > +bool trace_kprobe_on_func_entry(struct trace_event_call *call)
> >  {
> > struct trace_kprobe *tk = (struct trace_kprobe *)call->data;
> > -   return kprobe_ftrace(>rp.kp);
> > +
> > +   return kprobe_on_func_entry(tk->rp.kp.addr, 
> > tk->rp.kp.symbol_name,
> > +   tk->rp.kp.offset);
> 
>  That would be nice, but did you test this?
> >>>
> >>> Yes, because the jprobe, which was only official user of modifying 
> >>> execution
> >>> path using kprobe, did same way to check. (and kretprobe also does it)
> >>>
>  My understanding that kprobe will restore all regs and
>  here we need to override return ip _and_ value.
> >>>
> >>> yes, no problem. kprobe restore all regs from pt_regs, including regs->ip.
> >>>
>  Could you add a patch with the test the way Josef did
>  or describe the steps to test this new mode?
> >>>
> >>> Would you mean below patch? If so, it should work without any change.
> >>>
> >>>  [PATCH v10 4/5] samples/bpf: add a test for bpf_override_return
> >>
> >> yeah. I expect bpf_override_return test to work as-is.
> >> I'm asking for the test for new functionality added by this patch.
> >> In particular kprobe on func entry without ftrace.
> >> How did you test it?
> >
> > This function is used in kretprobe and jprobe. Jprobe was the user of
> > "modifying instruction pointer to another function" in kprobes.
> > If it doesn't work, jprobe also doesn't work, this means you can not
> > modify IP by 

Re: [RFC PATCH bpf-next v2 1/4] tracing/kprobe: bpf: Check error injectable event is on function entry

2017-12-27 Thread Masami Hiramatsu
On Wed, 27 Dec 2017 19:45:42 -0800
Alexei Starovoitov  wrote:

> On 12/27/17 6:34 PM, Masami Hiramatsu wrote:
> > On Wed, 27 Dec 2017 14:46:24 -0800
> > Alexei Starovoitov  wrote:
> >
> >> On 12/26/17 9:56 PM, Masami Hiramatsu wrote:
> >>> On Tue, 26 Dec 2017 17:57:32 -0800
> >>> Alexei Starovoitov  wrote:
> >>>
>  On Tue, Dec 26, 2017 at 04:46:59PM +0900, Masami Hiramatsu wrote:
> > Check whether error injectable event is on function entry or not.
> > Currently it checks the event is ftrace-based kprobes or not,
> > but that is wrong. It should check if the event is on the entry
> > of target function. Since error injection will override a function
> > to just return with modified return value, that operation must
> > be done before the target function starts making stackframe.
> >
> > As a side effect, bpf error injection is no need to depend on
> > function-tracer. It can work with sw-breakpoint based kprobe
> > events too.
> >
> > Signed-off-by: Masami Hiramatsu 
> > ---
> >  kernel/trace/Kconfig|2 --
> >  kernel/trace/bpf_trace.c|6 +++---
> >  kernel/trace/trace_kprobe.c |8 +---
> >  kernel/trace/trace_probe.h  |   12 ++--
> >  4 files changed, 14 insertions(+), 14 deletions(-)
> >
> > diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> > index ae3a2d519e50..6400e1bf97c5 100644
> > --- a/kernel/trace/Kconfig
> > +++ b/kernel/trace/Kconfig
> > @@ -533,9 +533,7 @@ config FUNCTION_PROFILER
> >  config BPF_KPROBE_OVERRIDE
> > bool "Enable BPF programs to override a kprobed function"
> > depends on BPF_EVENTS
> > -   depends on KPROBES_ON_FTRACE
> > depends on HAVE_KPROBE_OVERRIDE
> > -   depends on DYNAMIC_FTRACE_WITH_REGS
> > default n
> > help
> >  Allows BPF to override the execution of a probed function and
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index f6d2327ecb59..d663660f8392 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -800,11 +800,11 @@ int perf_event_attach_bpf_prog(struct perf_event 
> > *event,
> > int ret = -EEXIST;
> >
> > /*
> > -* Kprobe override only works for ftrace based kprobes, and 
> > only if they
> > -* are on the opt-in list.
> > +* Kprobe override only works if they are on the function entry,
> > +* and only if they are on the opt-in list.
> >  */
> > if (prog->kprobe_override &&
> > -   (!trace_kprobe_ftrace(event->tp_event) ||
> > +   (!trace_kprobe_on_func_entry(event->tp_event) ||
> >  !trace_kprobe_error_injectable(event->tp_event)))
> > return -EINVAL;
> >
> > diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> > index 91f4b57dab82..265e3e27e8dc 100644
> > --- a/kernel/trace/trace_kprobe.c
> > +++ b/kernel/trace/trace_kprobe.c
> > @@ -88,13 +88,15 @@ static nokprobe_inline unsigned long 
> > trace_kprobe_nhit(struct trace_kprobe *tk)
> > return nhit;
> >  }
> >
> > -int trace_kprobe_ftrace(struct trace_event_call *call)
> > +bool trace_kprobe_on_func_entry(struct trace_event_call *call)
> >  {
> > struct trace_kprobe *tk = (struct trace_kprobe *)call->data;
> > -   return kprobe_ftrace(>rp.kp);
> > +
> > +   return kprobe_on_func_entry(tk->rp.kp.addr, 
> > tk->rp.kp.symbol_name,
> > +   tk->rp.kp.offset);
> 
>  That would be nice, but did you test this?
> >>>
> >>> Yes, because the jprobe, which was only official user of modifying 
> >>> execution
> >>> path using kprobe, did same way to check. (and kretprobe also does it)
> >>>
>  My understanding that kprobe will restore all regs and
>  here we need to override return ip _and_ value.
> >>>
> >>> yes, no problem. kprobe restore all regs from pt_regs, including regs->ip.
> >>>
>  Could you add a patch with the test the way Josef did
>  or describe the steps to test this new mode?
> >>>
> >>> Would you mean below patch? If so, it should work without any change.
> >>>
> >>>  [PATCH v10 4/5] samples/bpf: add a test for bpf_override_return
> >>
> >> yeah. I expect bpf_override_return test to work as-is.
> >> I'm asking for the test for new functionality added by this patch.
> >> In particular kprobe on func entry without ftrace.
> >> How did you test it?
> >
> > This function is used in kretprobe and jprobe. Jprobe was the user of
> > "modifying instruction pointer to another function" in kprobes.
> > If it doesn't work, jprobe also doesn't work, this means you can not
> > modify IP by kprobes anymore.
> > Anyway, until linux-4.13, that was well tested by kprobe smoke 

Re: WARNING in strp_data_ready

2017-12-27 Thread Dmitry Vyukov
On Thu, Dec 28, 2017 at 2:19 AM, Tom Herbert  wrote:
> On Wed, Dec 27, 2017 at 12:20 PM, Ozgur  wrote:
>>
>>
>> 27.12.2017, 23:14, "Dmitry Vyukov" :
>>> On Wed, Dec 27, 2017 at 9:08 PM, Ozgur  wrote:
  27.12.2017, 22:21, "Dmitry Vyukov" :
>  On Wed, Dec 27, 2017 at 8:09 PM, Tom Herbert  
> wrote:
>>   Did you try the patch I posted?
>
>  Hi Tom,

  Hello Dmitry,

>  No. And I didn't know I need to. Why?
>  If you think the patch needs additional testing, you can ask syzbot to
>  test it. See 
> https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot
>  Otherwise proceed with committing it. Or what are we waiting for?
>
>  Thanks

  I think we need to fixed patch for crash, in fact check to patch code and 
 test solve the bug.
  How do test it because there is no patch in the following bug?
>>>
>>> Hi Ozgur,
>>>
>>> I am not sure I completely understand what you mean. But the
>>> reproducer for this bug (which one can use for testing) is here:
>>> https://groups.google.com/forum/#!topic/syzkaller-bugs/Kxs05ziCpgY
>>> Tom also mentions there is some patch for this, but I don't know where
>>> it is, it doesn't seem to be referenced from this thread.
>>
>> Hello Dmitry,
>>
>> Ah, I'm sorry I don't seen Tom mail and I don't have a patch not tested :)
>> I think Tom send patch to only you and are you tested?
>>
>> kcmsock.c will change and strp_data_ready I think locked.
>>
>> Tom, please send a patch for me? I can test and inform you.
>>
> Hi Ozgur,
>
> I reposted the patches as RFC "kcm: Fix lockdep issue". Please test if you 
> can!

OK, I will work as your typist this time:

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
master

But I wonder what part of the following you don't understand? Do we
need to improve wording or something?

> If you think the patch needs additional testing, you can ask syzbot to test 
> it.
> See 
> https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot

Also I don't know what git repo/branch you have in mind. Kernel
patches don't generally apply just to any tree. Fingers crossed that I
guessed correctly and it will apply.
diff --git a/include/net/sock.h b/include/net/sock.h
index 9155da422692..7a7b14e9628a 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1514,6 +1514,11 @@ static inline bool sock_owned_by_user(const struct sock *sk)
 	return sk->sk_lock.owned;
 }
 
+static inline bool sock_owned_by_user_nocheck(const struct sock *sk)
+{
+	return sk->sk_lock.owned;
+}
+
 /* no reclassification while locks are held */
 static inline bool sock_allow_reclassification(const struct sock *csk)
 {
diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c
index c5fda15ba319..1fdab5c4eda8 100644
--- a/net/strparser/strparser.c
+++ b/net/strparser/strparser.c
@@ -401,7 +401,7 @@ void strp_data_ready(struct strparser *strp)
 	 * allows a thread in BH context to safely check if the process
 	 * lock is held. In this case, if the lock is held, queue work.
 	 */
-	if (sock_owned_by_user(strp->sk)) {
+	if (sock_owned_by_user_nocheck(strp->sk)) {
 		queue_work(strp_wq, >work);
 		return;
 	}


Re: WARNING in strp_data_ready

2017-12-27 Thread Dmitry Vyukov
On Thu, Dec 28, 2017 at 2:19 AM, Tom Herbert  wrote:
> On Wed, Dec 27, 2017 at 12:20 PM, Ozgur  wrote:
>>
>>
>> 27.12.2017, 23:14, "Dmitry Vyukov" :
>>> On Wed, Dec 27, 2017 at 9:08 PM, Ozgur  wrote:
  27.12.2017, 22:21, "Dmitry Vyukov" :
>  On Wed, Dec 27, 2017 at 8:09 PM, Tom Herbert  
> wrote:
>>   Did you try the patch I posted?
>
>  Hi Tom,

  Hello Dmitry,

>  No. And I didn't know I need to. Why?
>  If you think the patch needs additional testing, you can ask syzbot to
>  test it. See 
> https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot
>  Otherwise proceed with committing it. Or what are we waiting for?
>
>  Thanks

  I think we need to fixed patch for crash, in fact check to patch code and 
 test solve the bug.
  How do test it because there is no patch in the following bug?
>>>
>>> Hi Ozgur,
>>>
>>> I am not sure I completely understand what you mean. But the
>>> reproducer for this bug (which one can use for testing) is here:
>>> https://groups.google.com/forum/#!topic/syzkaller-bugs/Kxs05ziCpgY
>>> Tom also mentions there is some patch for this, but I don't know where
>>> it is, it doesn't seem to be referenced from this thread.
>>
>> Hello Dmitry,
>>
>> Ah, I'm sorry I don't seen Tom mail and I don't have a patch not tested :)
>> I think Tom send patch to only you and are you tested?
>>
>> kcmsock.c will change and strp_data_ready I think locked.
>>
>> Tom, please send a patch for me? I can test and inform you.
>>
> Hi Ozgur,
>
> I reposted the patches as RFC "kcm: Fix lockdep issue". Please test if you 
> can!

OK, I will work as your typist this time:

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
master

But I wonder what part of the following you don't understand? Do we
need to improve wording or something?

> If you think the patch needs additional testing, you can ask syzbot to test 
> it.
> See 
> https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot

Also I don't know what git repo/branch you have in mind. Kernel
patches don't generally apply just to any tree. Fingers crossed that I
guessed correctly and it will apply.
diff --git a/include/net/sock.h b/include/net/sock.h
index 9155da422692..7a7b14e9628a 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1514,6 +1514,11 @@ static inline bool sock_owned_by_user(const struct sock *sk)
 	return sk->sk_lock.owned;
 }
 
+static inline bool sock_owned_by_user_nocheck(const struct sock *sk)
+{
+	return sk->sk_lock.owned;
+}
+
 /* no reclassification while locks are held */
 static inline bool sock_allow_reclassification(const struct sock *csk)
 {
diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c
index c5fda15ba319..1fdab5c4eda8 100644
--- a/net/strparser/strparser.c
+++ b/net/strparser/strparser.c
@@ -401,7 +401,7 @@ void strp_data_ready(struct strparser *strp)
 	 * allows a thread in BH context to safely check if the process
 	 * lock is held. In this case, if the lock is held, queue work.
 	 */
-	if (sock_owned_by_user(strp->sk)) {
+	if (sock_owned_by_user_nocheck(strp->sk)) {
 		queue_work(strp_wq, >work);
 		return;
 	}


Re: [RFC PATCH bpf-next v2 4/4] error-injection: Support fault injection framework

2017-12-27 Thread Masami Hiramatsu
On Wed, 27 Dec 2017 19:49:28 -0800
Alexei Starovoitov  wrote:

> On 12/27/17 5:38 PM, Masami Hiramatsu wrote:
> > On Wed, 27 Dec 2017 14:49:46 -0800
> > Alexei Starovoitov  wrote:
> >
> >> On 12/27/17 12:09 AM, Masami Hiramatsu wrote:
> >>> On Tue, 26 Dec 2017 18:12:56 -0800
> >>> Alexei Starovoitov  wrote:
> >>>
>  On Tue, Dec 26, 2017 at 04:48:25PM +0900, Masami Hiramatsu wrote:
> > Support in-kernel fault-injection framework via debugfs.
> > This allows you to inject a conditional error to specified
> > function using debugfs interfaces.
> >
> > Signed-off-by: Masami Hiramatsu 
> > ---
> >  Documentation/fault-injection/fault-injection.txt |5 +
> >  kernel/Makefile   |1
> >  kernel/fail_function.c|  169 
> > +
> >  lib/Kconfig.debug |   10 +
> >  4 files changed, 185 insertions(+)
> >  create mode 100644 kernel/fail_function.c
> >
> > diff --git a/Documentation/fault-injection/fault-injection.txt 
> > b/Documentation/fault-injection/fault-injection.txt
> > index 918972babcd8..6243a588dd71 100644
> > --- a/Documentation/fault-injection/fault-injection.txt
> > +++ b/Documentation/fault-injection/fault-injection.txt
> > @@ -30,6 +30,11 @@ o fail_mmc_request
> >injects MMC data errors on devices permitted by setting
> >debugfs entries under /sys/kernel/debug/mmc0/fail_mmc_request
> >
> > +o fail_function
> > +
> > +  injects error return on specific functions by setting debugfs entries
> > +  under /sys/kernel/debug/fail_function. No boot option supported.
> 
>  I like it.
>  Could you document it a bit better?
> >>>
> >>> Yes, I will do in next series.
> >>>
>  In particular retval is configurable, but without an example no one
>  will be able to figure out how to use it.
> >>>
> >>> Ah, right. BTW, as I pointed in the covermail, should we store the
> >>> expected error value range into the injectable list? e.g.
> >>>
> >>> ALLOW_ERROR_INJECTION(open_ctree, -1, -MAX_ERRNO)
> >>>
> >>> And provide APIs to check/get it.
> >>
> >> I'm afraid such check would be too costly.
> >> Right now we have only two functions marked but I expect hundreds more
> >> will be added in the near future as soon as developers realize the
> >> potential of such error injection.
> >> All of ALLOW_ERROR_INJECTION marks add 8 byte overhead each to .data.
> >> Multiple by 1k and we have 8k of data spent on marks.
> >> If we add max/min range marks that doubles it for very little use.
> >> I think marking function only is enough.
> >
> > Sorry, I don't think so.
> > Even if it takes 16 bytes more for each points, I don't think it is
> > any overhead for machines in these days. Even if so, we can provide
> > a kconfig to reduce it.
> > I mean, we are living in GB-order memory are, and it will be bigger
> > in the future. Why we have to worry about hundreds of 16bytes memory
> > pieces? It will take a few KB, and even if we mark thousands of
> > functions, it never reaches 1MB, in GB memory pool. :)
> >
> > Of course, for many small-footprint embedded devices (like having
> > less than 128MB memory), this feature can be a overhead. But they
> > can cut off the table by kconfig.
> 
> I still disagree on wasting 16-byte * num_of_funcs of .data here.
> The trade-off of usability vs memory just not worth it. Sorry.
> Please focus on testing your changes instead.

Then what happen if the user set invalid retval to those functions?
even if we limit the injectable functions, it can cause a problem,

for example, 

 obj = func_return_object();
 if (!obj) {
handling_error...;
 }
 obj->field = x;

In this case, obviously func_return_object() must return NULL if there is
an error, not -ENOMEM. But without the correct retval information, how would
you check the BPF code doesn't cause a trouble?
Currently it seems you are expecting only the functions which return error code.

 ret = func_return_state();
 if (ret < 0) {
handling_error...;
 }

But how we can distinguish those?

If we have the error range for each function, we can ensure what is
*correct* error code, NULL or errno, or any other error numbers. :)

At least fail_function needs this feature because it can check
return value when setting it up.

Thank you,

-- 
Masami Hiramatsu 


Re: [RFC PATCH bpf-next v2 4/4] error-injection: Support fault injection framework

2017-12-27 Thread Masami Hiramatsu
On Wed, 27 Dec 2017 19:49:28 -0800
Alexei Starovoitov  wrote:

> On 12/27/17 5:38 PM, Masami Hiramatsu wrote:
> > On Wed, 27 Dec 2017 14:49:46 -0800
> > Alexei Starovoitov  wrote:
> >
> >> On 12/27/17 12:09 AM, Masami Hiramatsu wrote:
> >>> On Tue, 26 Dec 2017 18:12:56 -0800
> >>> Alexei Starovoitov  wrote:
> >>>
>  On Tue, Dec 26, 2017 at 04:48:25PM +0900, Masami Hiramatsu wrote:
> > Support in-kernel fault-injection framework via debugfs.
> > This allows you to inject a conditional error to specified
> > function using debugfs interfaces.
> >
> > Signed-off-by: Masami Hiramatsu 
> > ---
> >  Documentation/fault-injection/fault-injection.txt |5 +
> >  kernel/Makefile   |1
> >  kernel/fail_function.c|  169 
> > +
> >  lib/Kconfig.debug |   10 +
> >  4 files changed, 185 insertions(+)
> >  create mode 100644 kernel/fail_function.c
> >
> > diff --git a/Documentation/fault-injection/fault-injection.txt 
> > b/Documentation/fault-injection/fault-injection.txt
> > index 918972babcd8..6243a588dd71 100644
> > --- a/Documentation/fault-injection/fault-injection.txt
> > +++ b/Documentation/fault-injection/fault-injection.txt
> > @@ -30,6 +30,11 @@ o fail_mmc_request
> >injects MMC data errors on devices permitted by setting
> >debugfs entries under /sys/kernel/debug/mmc0/fail_mmc_request
> >
> > +o fail_function
> > +
> > +  injects error return on specific functions by setting debugfs entries
> > +  under /sys/kernel/debug/fail_function. No boot option supported.
> 
>  I like it.
>  Could you document it a bit better?
> >>>
> >>> Yes, I will do in next series.
> >>>
>  In particular retval is configurable, but without an example no one
>  will be able to figure out how to use it.
> >>>
> >>> Ah, right. BTW, as I pointed in the covermail, should we store the
> >>> expected error value range into the injectable list? e.g.
> >>>
> >>> ALLOW_ERROR_INJECTION(open_ctree, -1, -MAX_ERRNO)
> >>>
> >>> And provide APIs to check/get it.
> >>
> >> I'm afraid such check would be too costly.
> >> Right now we have only two functions marked but I expect hundreds more
> >> will be added in the near future as soon as developers realize the
> >> potential of such error injection.
> >> All of ALLOW_ERROR_INJECTION marks add 8 byte overhead each to .data.
> >> Multiple by 1k and we have 8k of data spent on marks.
> >> If we add max/min range marks that doubles it for very little use.
> >> I think marking function only is enough.
> >
> > Sorry, I don't think so.
> > Even if it takes 16 bytes more for each points, I don't think it is
> > any overhead for machines in these days. Even if so, we can provide
> > a kconfig to reduce it.
> > I mean, we are living in GB-order memory are, and it will be bigger
> > in the future. Why we have to worry about hundreds of 16bytes memory
> > pieces? It will take a few KB, and even if we mark thousands of
> > functions, it never reaches 1MB, in GB memory pool. :)
> >
> > Of course, for many small-footprint embedded devices (like having
> > less than 128MB memory), this feature can be a overhead. But they
> > can cut off the table by kconfig.
> 
> I still disagree on wasting 16-byte * num_of_funcs of .data here.
> The trade-off of usability vs memory just not worth it. Sorry.
> Please focus on testing your changes instead.

Then what happen if the user set invalid retval to those functions?
even if we limit the injectable functions, it can cause a problem,

for example, 

 obj = func_return_object();
 if (!obj) {
handling_error...;
 }
 obj->field = x;

In this case, obviously func_return_object() must return NULL if there is
an error, not -ENOMEM. But without the correct retval information, how would
you check the BPF code doesn't cause a trouble?
Currently it seems you are expecting only the functions which return error code.

 ret = func_return_state();
 if (ret < 0) {
handling_error...;
 }

But how we can distinguish those?

If we have the error range for each function, we can ensure what is
*correct* error code, NULL or errno, or any other error numbers. :)

At least fail_function needs this feature because it can check
return value when setting it up.

Thank you,

-- 
Masami Hiramatsu 


[PATCH v2] ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGE

2017-12-27 Thread Gang He
If we can't get inode lock immediately in the function
ocfs2_inode_lock_with_page() when reading a page, we should not
return directly here, since this will lead to a softlockup problem
when the kernel is configured with CONFIG_PREEMPT is not set.
The method is to get a blocking lock and immediately unlock before
returning, this can avoid CPU resource waste due to lots of retries,
and benefits fairness in getting lock among multiple nodes, increase
efficiency in case modifying the same file frequently from multiple
nodes.
The softlockup crash (when set /proc/sys/kernel/softlockup_panic to 1)
looks like,
Kernel panic - not syncing: softlockup: hung tasks
CPU: 0 PID: 885 Comm: multi_mmap Tainted: G L 4.12.14-6.1-default #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
  
  dump_stack+0x5c/0x82
  panic+0xd5/0x21e
  watchdog_timer_fn+0x208/0x210
  ? watchdog_park_threads+0x70/0x70
  __hrtimer_run_queues+0xcc/0x200
  hrtimer_interrupt+0xa6/0x1f0
  smp_apic_timer_interrupt+0x34/0x50
  apic_timer_interrupt+0x96/0xa0
  
 RIP: 0010:unlock_page+0x17/0x30
 RSP: :af154080bc88 EFLAGS: 0246 ORIG_RAX: ff10
 RAX: dead0100 RBX: f21e009f5300 RCX: 0004
 RDX: dead00ff RSI: 0202 RDI: f21e009f5300
 RBP:  R08:  R09: af154080bb00
 R10: af154080bc30 R11: 0040 R12: 993749a39518
 R13:  R14: f21e009f5300 R15: f21e009f5300
  ocfs2_inode_lock_with_page+0x25/0x30 [ocfs2]
  ocfs2_readpage+0x41/0x2d0 [ocfs2]
  ? pagecache_get_page+0x30/0x200
  filemap_fault+0x12b/0x5c0
  ? recalc_sigpending+0x17/0x50
  ? __set_task_blocked+0x28/0x70
  ? __set_current_blocked+0x3d/0x60
  ocfs2_fault+0x29/0xb0 [ocfs2]
  __do_fault+0x1a/0xa0
  __handle_mm_fault+0xbe8/0x1090
  handle_mm_fault+0xaa/0x1f0
  __do_page_fault+0x235/0x4b0
  trace_do_page_fault+0x3c/0x110
  async_page_fault+0x28/0x30
 RIP: 0033:0x7fa75ded638e
 RSP: 002b:7ffd6657db18 EFLAGS: 00010287
 RAX: 55c7662fb700 RBX: 0001 RCX: 55c7662fb700
 RDX: 1770 RSI: 7fa75e909000 RDI: 55c7662fb700
 RBP: 0003 R08: 000e R09: 
 R10: 0483 R11: 7fa75ded61b0 R12: 7fa75e90a770
 R13: 000e R14: 1770 R15: 

About performance improvement, we can see the testing time is reduced,
and CPU utilization decreases, the detailed data is as follows.
I ran multi_mmap test case in ocfs2-test package in a three nodes cluster.
Before apply this patch,
  PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+ COMMAND
 2754 ocfs2te+  20   0  170248   6980   4856 D 80.73 0.341   0:18.71 multi_mmap
 1505 root  rt   0  36 123060  97224 S 2.658 6.015   0:01.44 corosync
5 root  20   0   0  0  0 S 1.329 0.000   0:00.19 
kworker/u8:0
   95 root  20   0   0  0  0 S 1.329 0.000   0:00.25 
kworker/u8:1
 2728 root  20   0   0  0  0 S 0.997 0.000   0:00.24 
jbd2/sda1-33
 2721 root  20   0   0  0  0 S 0.664 0.000   0:00.07 
ocfs2dc-3C8CFD4
 2750 ocfs2te+  20   0  142976   4652   3532 S 0.664 0.227   0:00.28 mpirun

ocfs2test@tb-node2:~>multiple_run.sh -i ens3 -k ~/linux-4.4.21-69.tar.gz -o 
~/ocfs2mullog -C hacluster -s pcmk -n tb-node2,tb-node1,tb-node3 -d 
/dev/sda1 -b 4096 -c 32768 -t multi_mmap /mnt/shared
Tests with "-b 4096 -C 32768"
Thu Dec 28 14:44:52 CST 2017
multi_mmap..Passed.
Runtime 783 seconds.

After apply this patch,
  PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+ COMMAND
 2508 ocfs2te+  20   0  170248   6804   4680 R 54.00 0.333   0:55.37 multi_mmap
  155 root  20   0   0  0  0 S 2.667 0.000   0:01.20 
kworker/u8:3
   95 root  20   0   0  0  0 S 2.000 0.000   0:01.58 
kworker/u8:1
 2504 ocfs2te+  20   0  142976   4604   3480 R 1.667 0.225   0:01.65 mpirun
5 root  20   0   0  0  0 S 1.000 0.000   0:01.36 
kworker/u8:0
 2482 root  20   0   0  0  0 S 1.000 0.000   0:00.86 
jbd2/sda1-33
  299 root   0 -20   0  0  0 S 0.333 0.000   0:00.13 
kworker/2:1H
  335 root   0 -20   0  0  0 S 0.333 0.000   0:00.17 
kworker/1:1H
  535 root  20   0   12140   7268   1456 S 0.333 0.355   0:00.34 haveged
 1282 root  rt   0  84 123108  97224 S 0.333 6.017   0:01.33 corosync

ocfs2test@tb-node2:~>multiple_run.sh -i ens3 -k ~/linux-4.4.21-69.tar.gz -o 
~/ocfs2mullog -C hacluster -s pcmk -n tb-node2,tb-node1,tb-node3 -d 
/dev/sda1 -b 4096 -c 32768 -t multi_mmap /mnt/shared
Tests with "-b 4096 -C 32768"
Thu Dec 28 15:04:12 CST 2017
multi_mmap..Passed.
Runtime 487 seconds.

Fixes: 1cce4df04f37 ("ocfs2: do not lock/unlock() inode DLM lock")
Signed-off-by: Gang He 
---
 fs/ocfs2/dlmglue.c | 9 +
 1 file changed, 9 

[PATCH v2] ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGE

2017-12-27 Thread Gang He
If we can't get inode lock immediately in the function
ocfs2_inode_lock_with_page() when reading a page, we should not
return directly here, since this will lead to a softlockup problem
when the kernel is configured with CONFIG_PREEMPT is not set.
The method is to get a blocking lock and immediately unlock before
returning, this can avoid CPU resource waste due to lots of retries,
and benefits fairness in getting lock among multiple nodes, increase
efficiency in case modifying the same file frequently from multiple
nodes.
The softlockup crash (when set /proc/sys/kernel/softlockup_panic to 1)
looks like,
Kernel panic - not syncing: softlockup: hung tasks
CPU: 0 PID: 885 Comm: multi_mmap Tainted: G L 4.12.14-6.1-default #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
  
  dump_stack+0x5c/0x82
  panic+0xd5/0x21e
  watchdog_timer_fn+0x208/0x210
  ? watchdog_park_threads+0x70/0x70
  __hrtimer_run_queues+0xcc/0x200
  hrtimer_interrupt+0xa6/0x1f0
  smp_apic_timer_interrupt+0x34/0x50
  apic_timer_interrupt+0x96/0xa0
  
 RIP: 0010:unlock_page+0x17/0x30
 RSP: :af154080bc88 EFLAGS: 0246 ORIG_RAX: ff10
 RAX: dead0100 RBX: f21e009f5300 RCX: 0004
 RDX: dead00ff RSI: 0202 RDI: f21e009f5300
 RBP:  R08:  R09: af154080bb00
 R10: af154080bc30 R11: 0040 R12: 993749a39518
 R13:  R14: f21e009f5300 R15: f21e009f5300
  ocfs2_inode_lock_with_page+0x25/0x30 [ocfs2]
  ocfs2_readpage+0x41/0x2d0 [ocfs2]
  ? pagecache_get_page+0x30/0x200
  filemap_fault+0x12b/0x5c0
  ? recalc_sigpending+0x17/0x50
  ? __set_task_blocked+0x28/0x70
  ? __set_current_blocked+0x3d/0x60
  ocfs2_fault+0x29/0xb0 [ocfs2]
  __do_fault+0x1a/0xa0
  __handle_mm_fault+0xbe8/0x1090
  handle_mm_fault+0xaa/0x1f0
  __do_page_fault+0x235/0x4b0
  trace_do_page_fault+0x3c/0x110
  async_page_fault+0x28/0x30
 RIP: 0033:0x7fa75ded638e
 RSP: 002b:7ffd6657db18 EFLAGS: 00010287
 RAX: 55c7662fb700 RBX: 0001 RCX: 55c7662fb700
 RDX: 1770 RSI: 7fa75e909000 RDI: 55c7662fb700
 RBP: 0003 R08: 000e R09: 
 R10: 0483 R11: 7fa75ded61b0 R12: 7fa75e90a770
 R13: 000e R14: 1770 R15: 

About performance improvement, we can see the testing time is reduced,
and CPU utilization decreases, the detailed data is as follows.
I ran multi_mmap test case in ocfs2-test package in a three nodes cluster.
Before apply this patch,
  PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+ COMMAND
 2754 ocfs2te+  20   0  170248   6980   4856 D 80.73 0.341   0:18.71 multi_mmap
 1505 root  rt   0  36 123060  97224 S 2.658 6.015   0:01.44 corosync
5 root  20   0   0  0  0 S 1.329 0.000   0:00.19 
kworker/u8:0
   95 root  20   0   0  0  0 S 1.329 0.000   0:00.25 
kworker/u8:1
 2728 root  20   0   0  0  0 S 0.997 0.000   0:00.24 
jbd2/sda1-33
 2721 root  20   0   0  0  0 S 0.664 0.000   0:00.07 
ocfs2dc-3C8CFD4
 2750 ocfs2te+  20   0  142976   4652   3532 S 0.664 0.227   0:00.28 mpirun

ocfs2test@tb-node2:~>multiple_run.sh -i ens3 -k ~/linux-4.4.21-69.tar.gz -o 
~/ocfs2mullog -C hacluster -s pcmk -n tb-node2,tb-node1,tb-node3 -d 
/dev/sda1 -b 4096 -c 32768 -t multi_mmap /mnt/shared
Tests with "-b 4096 -C 32768"
Thu Dec 28 14:44:52 CST 2017
multi_mmap..Passed.
Runtime 783 seconds.

After apply this patch,
  PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+ COMMAND
 2508 ocfs2te+  20   0  170248   6804   4680 R 54.00 0.333   0:55.37 multi_mmap
  155 root  20   0   0  0  0 S 2.667 0.000   0:01.20 
kworker/u8:3
   95 root  20   0   0  0  0 S 2.000 0.000   0:01.58 
kworker/u8:1
 2504 ocfs2te+  20   0  142976   4604   3480 R 1.667 0.225   0:01.65 mpirun
5 root  20   0   0  0  0 S 1.000 0.000   0:01.36 
kworker/u8:0
 2482 root  20   0   0  0  0 S 1.000 0.000   0:00.86 
jbd2/sda1-33
  299 root   0 -20   0  0  0 S 0.333 0.000   0:00.13 
kworker/2:1H
  335 root   0 -20   0  0  0 S 0.333 0.000   0:00.17 
kworker/1:1H
  535 root  20   0   12140   7268   1456 S 0.333 0.355   0:00.34 haveged
 1282 root  rt   0  84 123108  97224 S 0.333 6.017   0:01.33 corosync

ocfs2test@tb-node2:~>multiple_run.sh -i ens3 -k ~/linux-4.4.21-69.tar.gz -o 
~/ocfs2mullog -C hacluster -s pcmk -n tb-node2,tb-node1,tb-node3 -d 
/dev/sda1 -b 4096 -c 32768 -t multi_mmap /mnt/shared
Tests with "-b 4096 -C 32768"
Thu Dec 28 15:04:12 CST 2017
multi_mmap..Passed.
Runtime 487 seconds.

Fixes: 1cce4df04f37 ("ocfs2: do not lock/unlock() inode DLM lock")
Signed-off-by: Gang He 
---
 fs/ocfs2/dlmglue.c | 9 +
 1 file changed, 9 insertions(+)


Re: [PATCH net-next 5/6] arm64: dts: marvell: mcbin: enable the fourth network interface

2017-12-27 Thread Andrew Lunn
On Wed, Dec 27, 2017 at 10:24:01PM +, Russell King - ARM Linux wrote:
> On Wed, Dec 27, 2017 at 11:14:45PM +0100, Antoine Tenart wrote:
> > This patch enables the fourth network interface on the Marvell
> > Macchiatobin. It is configured in the 2500Base-X PHY mode.
> > 
> > Signed-off-by: Antoine Tenart 
> > ---
> >  arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts | 8 
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts 
> > b/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
> > index b3350827ee55..c51efd511324 100644
> > --- a/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
> > +++ b/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
> > @@ -279,6 +279,14 @@
> > phys = <_comphy0 1>;
> >  };
> >  
> > +_eth2 {
> > +   /* CPS Lane 5 */
> > +   status = "okay";
> > +   phy-mode = "2500base-x";
> > +   /* Generic PHY, providing serdes lanes */
> > +   phys = <_comphy5 2>;
> > +};
> > +
> 
> This is wrong.  This lane is connected to a SFP cage which can support
> more than just 2500base-X.  Tying it in this way to 2500base-X means
> that this port does not support conenctions at 1000base-X, despite
> that's one of the most popular and more standardised speeds.
> 

Hi Antoine

I agree with Russell here. SFP modules are hot pluggable, and support
a range of interface modes. You need to query what the SFP module is
in order to know how to configure the SERDES interface. The phylink
infrastructure does that for you.

   Andrew



Re: [PATCH net-next 5/6] arm64: dts: marvell: mcbin: enable the fourth network interface

2017-12-27 Thread Andrew Lunn
On Wed, Dec 27, 2017 at 10:24:01PM +, Russell King - ARM Linux wrote:
> On Wed, Dec 27, 2017 at 11:14:45PM +0100, Antoine Tenart wrote:
> > This patch enables the fourth network interface on the Marvell
> > Macchiatobin. It is configured in the 2500Base-X PHY mode.
> > 
> > Signed-off-by: Antoine Tenart 
> > ---
> >  arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts | 8 
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts 
> > b/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
> > index b3350827ee55..c51efd511324 100644
> > --- a/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
> > +++ b/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
> > @@ -279,6 +279,14 @@
> > phys = <_comphy0 1>;
> >  };
> >  
> > +_eth2 {
> > +   /* CPS Lane 5 */
> > +   status = "okay";
> > +   phy-mode = "2500base-x";
> > +   /* Generic PHY, providing serdes lanes */
> > +   phys = <_comphy5 2>;
> > +};
> > +
> 
> This is wrong.  This lane is connected to a SFP cage which can support
> more than just 2500base-X.  Tying it in this way to 2500base-X means
> that this port does not support conenctions at 1000base-X, despite
> that's one of the most popular and more standardised speeds.
> 

Hi Antoine

I agree with Russell here. SFP modules are hot pluggable, and support
a range of interface modes. You need to query what the SFP module is
in order to know how to configure the SERDES interface. The phylink
infrastructure does that for you.

   Andrew



Re: mmots build error (2)

2017-12-27 Thread Dmitry Vyukov
On Thu, Dec 28, 2017 at 8:32 AM, syzbot
 wrote:
> Hello,
>
> syzkaller hit the following crash on
> 253f36c7c1aee654871b4f0c5e16ee6c396bff9a
> git://git.cmpxchg.org/linux-mmots.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> Unfortunately, I don't have any reproducer for this bug yet.
>
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: 
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.


+mm
the build is broken as:

failed to run /usr/bin/make [make bzImage -j 32
CC=/syzkaller/gcc/bin/gcc]: exit status 2
scripts/kconfig/conf  --silentoldconfig Kconfig
  CHK include/config/kernel.release
  CHK include/generated/uapi/linux/version.h
  UPD include/config/kernel.release
  CHK include/generated/utsrelease.h
  UPD include/generated/utsrelease.h
  CC  kernel/bounds.s
  CHK include/generated/timeconst.h
  CC  scripts/mod/empty.o
  CC  scripts/mod/devicetable-offsets.s
  MKELF   scripts/mod/elfconfig.h
  HOSTCC  scripts/mod/modpost.o
  HOSTCC  scripts/mod/sumversion.o
  CHK scripts/mod/devicetable-offsets.h
  HOSTCC  scripts/mod/file2alias.o
  HOSTLD  scripts/mod/modpost
  CHK include/generated/bounds.h
  CC  arch/x86/kernel/asm-offsets.s
In file included from ./arch/x86/include/asm/pgtable_types.h:250:0,
 from ./arch/x86/include/asm/paravirt_types.h:45,
 from ./arch/x86/include/asm/ptrace.h:92,
 from ./arch/x86/include/asm/math_emu.h:5,
 from ./arch/x86/include/asm/processor.h:12,
 from ./arch/x86/include/asm/cpufeature.h:5,
 from ./arch/x86/include/asm/thread_info.h:53,
 from ./include/linux/thread_info.h:38,
 from ./arch/x86/include/asm/preempt.h:7,
 from ./include/linux/preempt.h:81,
 from ./include/linux/spinlock.h:51,
 from ./include/linux/mmzone.h:8,
 from ./include/linux/gfp.h:6,
 from ./include/linux/slab.h:15,
 from ./include/linux/crypto.h:24,
 from arch/x86/kernel/asm-offsets.c:9:
./arch/x86/include/asm/pgtable_64_types.h:97:1: error: version control
conflict marker in file
 <<< HEAD
 ^~~
In file included from ./arch/x86/include/asm/paravirt_types.h:45:0,
 from ./arch/x86/include/asm/ptrace.h:92,
 from ./arch/x86/include/asm/math_emu.h:5,
 from ./arch/x86/include/asm/processor.h:12,
 from ./arch/x86/include/asm/cpufeature.h:5,
 from ./arch/x86/include/asm/thread_info.h:53,
 from ./include/linux/thread_info.h:38,
 from ./arch/x86/include/asm/preempt.h:7,
 from ./include/linux/preempt.h:81,
 from ./include/linux/spinlock.h:51,
 from ./include/linux/mmzone.h:8,
 from ./include/linux/gfp.h:6,
 from ./include/linux/slab.h:15,
 from ./include/linux/crypto.h:24,
 from arch/x86/kernel/asm-offsets.c:9:
./arch/x86/include/asm/pgtable_types.h:266:47: warning: data
definition has no type or storage class
 typedef struct pgprot { pgprotval_t pgprot; } pgprot_t;
   ^~~~
./arch/x86/include/asm/pgtable_types.h:266:47: error: type defaults to
‘int’ in declaration of ‘pgprot_t’ [-Werror=implicit-int]
In file included from ./arch/x86/include/asm/paravirt_types.h:45:0,
 from ./arch/x86/include/asm/ptrace.h:92,
 from ./arch/x86/include/asm/math_emu.h:5,
 from ./arch/x86/include/asm/processor.h:12,
 from ./arch/x86/include/asm/cpufeature.h:5,
 from ./arch/x86/include/asm/thread_info.h:53,
 from ./include/linux/thread_info.h:38,
 from ./arch/x86/include/asm/preempt.h:7,
 from ./include/linux/preempt.h:81,
 from ./include/linux/spinlock.h:51,
 from ./include/linux/mmzone.h:8,
 from ./include/linux/gfp.h:6,
 from ./include/linux/slab.h:15,
 from ./include/linux/crypto.h:24,
 from arch/x86/kernel/asm-offsets.c:9:
./arch/x86/include/asm/pgtable_types.h:441:24: error: expected ‘=’,
‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘cachemode2pgprot’
 static inline pgprot_t cachemode2pgprot(enum page_cache_mode pcm)
^~~~
./arch/x86/include/asm/pgtable_types.h:445:53: error: expected
declaration specifiers or ‘...’ before ‘pgprot_t’
 static inline enum page_cache_mode pgprot2cachemode(pgprot_t pgprot)
  

Re: mmots build error (2)

2017-12-27 Thread Dmitry Vyukov
On Thu, Dec 28, 2017 at 8:32 AM, syzbot
 wrote:
> Hello,
>
> syzkaller hit the following crash on
> 253f36c7c1aee654871b4f0c5e16ee6c396bff9a
> git://git.cmpxchg.org/linux-mmots.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> Unfortunately, I don't have any reproducer for this bug yet.
>
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: 
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.


+mm
the build is broken as:

failed to run /usr/bin/make [make bzImage -j 32
CC=/syzkaller/gcc/bin/gcc]: exit status 2
scripts/kconfig/conf  --silentoldconfig Kconfig
  CHK include/config/kernel.release
  CHK include/generated/uapi/linux/version.h
  UPD include/config/kernel.release
  CHK include/generated/utsrelease.h
  UPD include/generated/utsrelease.h
  CC  kernel/bounds.s
  CHK include/generated/timeconst.h
  CC  scripts/mod/empty.o
  CC  scripts/mod/devicetable-offsets.s
  MKELF   scripts/mod/elfconfig.h
  HOSTCC  scripts/mod/modpost.o
  HOSTCC  scripts/mod/sumversion.o
  CHK scripts/mod/devicetable-offsets.h
  HOSTCC  scripts/mod/file2alias.o
  HOSTLD  scripts/mod/modpost
  CHK include/generated/bounds.h
  CC  arch/x86/kernel/asm-offsets.s
In file included from ./arch/x86/include/asm/pgtable_types.h:250:0,
 from ./arch/x86/include/asm/paravirt_types.h:45,
 from ./arch/x86/include/asm/ptrace.h:92,
 from ./arch/x86/include/asm/math_emu.h:5,
 from ./arch/x86/include/asm/processor.h:12,
 from ./arch/x86/include/asm/cpufeature.h:5,
 from ./arch/x86/include/asm/thread_info.h:53,
 from ./include/linux/thread_info.h:38,
 from ./arch/x86/include/asm/preempt.h:7,
 from ./include/linux/preempt.h:81,
 from ./include/linux/spinlock.h:51,
 from ./include/linux/mmzone.h:8,
 from ./include/linux/gfp.h:6,
 from ./include/linux/slab.h:15,
 from ./include/linux/crypto.h:24,
 from arch/x86/kernel/asm-offsets.c:9:
./arch/x86/include/asm/pgtable_64_types.h:97:1: error: version control
conflict marker in file
 <<< HEAD
 ^~~
In file included from ./arch/x86/include/asm/paravirt_types.h:45:0,
 from ./arch/x86/include/asm/ptrace.h:92,
 from ./arch/x86/include/asm/math_emu.h:5,
 from ./arch/x86/include/asm/processor.h:12,
 from ./arch/x86/include/asm/cpufeature.h:5,
 from ./arch/x86/include/asm/thread_info.h:53,
 from ./include/linux/thread_info.h:38,
 from ./arch/x86/include/asm/preempt.h:7,
 from ./include/linux/preempt.h:81,
 from ./include/linux/spinlock.h:51,
 from ./include/linux/mmzone.h:8,
 from ./include/linux/gfp.h:6,
 from ./include/linux/slab.h:15,
 from ./include/linux/crypto.h:24,
 from arch/x86/kernel/asm-offsets.c:9:
./arch/x86/include/asm/pgtable_types.h:266:47: warning: data
definition has no type or storage class
 typedef struct pgprot { pgprotval_t pgprot; } pgprot_t;
   ^~~~
./arch/x86/include/asm/pgtable_types.h:266:47: error: type defaults to
‘int’ in declaration of ‘pgprot_t’ [-Werror=implicit-int]
In file included from ./arch/x86/include/asm/paravirt_types.h:45:0,
 from ./arch/x86/include/asm/ptrace.h:92,
 from ./arch/x86/include/asm/math_emu.h:5,
 from ./arch/x86/include/asm/processor.h:12,
 from ./arch/x86/include/asm/cpufeature.h:5,
 from ./arch/x86/include/asm/thread_info.h:53,
 from ./include/linux/thread_info.h:38,
 from ./arch/x86/include/asm/preempt.h:7,
 from ./include/linux/preempt.h:81,
 from ./include/linux/spinlock.h:51,
 from ./include/linux/mmzone.h:8,
 from ./include/linux/gfp.h:6,
 from ./include/linux/slab.h:15,
 from ./include/linux/crypto.h:24,
 from arch/x86/kernel/asm-offsets.c:9:
./arch/x86/include/asm/pgtable_types.h:441:24: error: expected ‘=’,
‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘cachemode2pgprot’
 static inline pgprot_t cachemode2pgprot(enum page_cache_mode pcm)
^~~~
./arch/x86/include/asm/pgtable_types.h:445:53: error: expected
declaration specifiers or ‘...’ before ‘pgprot_t’
 static inline enum page_cache_mode pgprot2cachemode(pgprot_t pgprot)
 ^~~~
./arch/x86/include/asm/pgtable_types.h:454:24: error: expected ‘=’,

Re: WARNING: kernel stack regs has bad 'bp' value (2)

2017-12-27 Thread Dmitry Vyukov
On Wed, Dec 27, 2017 at 7:29 PM, Dmitry Vyukov  wrote:
> On Thu, Nov 30, 2017 at 10:17 AM, Eric Biggers  wrote:
>> On Tue, Nov 28, 2017 at 10:36:01AM -0800, syzbot wrote:
>>> WARNING: kernel stack regs at 8801c1e5f468 in syzkaller196611:6199 has
>>> bad 'bp' value 0001
>>> unwind stack type:0 next_sp:  (null) mask:0x6 graph_idx:0
>>> 8801db4075a8: 8801db407630 (0x8801db407630)
>>> 8801db4075b0: 8128a84e (__save_stack_trace+0x6e/0xd0)
>>> 8801db4075b8:  ...
>>> 8801db4075c0: 8801c1e58000 (0x8801c1e58000)
>>> 8801db4075c8: 8801c1e6 (0x8801c1e6)
>>> 8801db4075d0:  ...
>>> 8801db4075d8: 0006 (0x6)
>>> 8801db4075e0: 8801c1e4e000 (0x8801c1e4e000)
>>> 8801db4075e8: 0101 (0x101)
>>> 8801db4075f0:  ...
>>> 8801db4075f8: 8801db4075a8 (0x8801db4075a8)
>>> 8801db407600: 8134ff7d (__twofish_enc_blk_3way+0x1b1d/0x1b30)
>>
>> Looks like the x86_64 "3 way" version of Twofish 
>> (twofish-x86_64-asm_64-3way.S)
>> needs to be updated to not use %rbp.
>
>
> This is what is supposed to be fixed with "crypto: x86/twofish-3way -
> Fix %rbp usage", right? Was it merged anywhere?
> This is one of top crashers with 15K crashes.


#syz fix: crypto: x86/twofish-3way - Fix %rbp usage


Re: WARNING: kernel stack regs has bad 'bp' value (2)

2017-12-27 Thread Dmitry Vyukov
On Wed, Dec 27, 2017 at 7:29 PM, Dmitry Vyukov  wrote:
> On Thu, Nov 30, 2017 at 10:17 AM, Eric Biggers  wrote:
>> On Tue, Nov 28, 2017 at 10:36:01AM -0800, syzbot wrote:
>>> WARNING: kernel stack regs at 8801c1e5f468 in syzkaller196611:6199 has
>>> bad 'bp' value 0001
>>> unwind stack type:0 next_sp:  (null) mask:0x6 graph_idx:0
>>> 8801db4075a8: 8801db407630 (0x8801db407630)
>>> 8801db4075b0: 8128a84e (__save_stack_trace+0x6e/0xd0)
>>> 8801db4075b8:  ...
>>> 8801db4075c0: 8801c1e58000 (0x8801c1e58000)
>>> 8801db4075c8: 8801c1e6 (0x8801c1e6)
>>> 8801db4075d0:  ...
>>> 8801db4075d8: 0006 (0x6)
>>> 8801db4075e0: 8801c1e4e000 (0x8801c1e4e000)
>>> 8801db4075e8: 0101 (0x101)
>>> 8801db4075f0:  ...
>>> 8801db4075f8: 8801db4075a8 (0x8801db4075a8)
>>> 8801db407600: 8134ff7d (__twofish_enc_blk_3way+0x1b1d/0x1b30)
>>
>> Looks like the x86_64 "3 way" version of Twofish 
>> (twofish-x86_64-asm_64-3way.S)
>> needs to be updated to not use %rbp.
>
>
> This is what is supposed to be fixed with "crypto: x86/twofish-3way -
> Fix %rbp usage", right? Was it merged anywhere?
> This is one of top crashers with 15K crashes.


#syz fix: crypto: x86/twofish-3way - Fix %rbp usage


Re: [PATCH net-next 1/6] phy: add 2.5G SGMII mode to the phy_mode enum

2017-12-27 Thread Andrew Lunn
On Wed, Dec 27, 2017 at 11:14:41PM +0100, Antoine Tenart wrote:
> This patch adds one more generic PHY mode to the phy_mode enum, to allow
> configuring generic PHYs to the 2.5G SGMII mode by using the set_mode
> callback.
> 
> Signed-off-by: Antoine Tenart 
> ---
>  include/linux/phy/phy.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/include/linux/phy/phy.h b/include/linux/phy/phy.h
> index 4f8423a948d5..70459a28f3a1 100644
> --- a/include/linux/phy/phy.h
> +++ b/include/linux/phy/phy.h
> @@ -28,6 +28,7 @@ enum phy_mode {
>   PHY_MODE_USB_DEVICE,
>   PHY_MODE_USB_OTG,
>   PHY_MODE_SGMII,
> + PHY_MODE_SGMII_2_5G,
>   PHY_MODE_10GKR,
>   PHY_MODE_UFS_HS_A,
>   PHY_MODE_UFS_HS_B,

Hi Antoine

There was a discussion maybe last month about adding 2.5G SGMII. I
would prefer 2500SGMII. Putting the number first makes it uniform with
the other defines, 1000BASEX, 25000BASEX, 10GKR.

Andrew


Re: [PATCH net-next 1/6] phy: add 2.5G SGMII mode to the phy_mode enum

2017-12-27 Thread Andrew Lunn
On Wed, Dec 27, 2017 at 11:14:41PM +0100, Antoine Tenart wrote:
> This patch adds one more generic PHY mode to the phy_mode enum, to allow
> configuring generic PHYs to the 2.5G SGMII mode by using the set_mode
> callback.
> 
> Signed-off-by: Antoine Tenart 
> ---
>  include/linux/phy/phy.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/include/linux/phy/phy.h b/include/linux/phy/phy.h
> index 4f8423a948d5..70459a28f3a1 100644
> --- a/include/linux/phy/phy.h
> +++ b/include/linux/phy/phy.h
> @@ -28,6 +28,7 @@ enum phy_mode {
>   PHY_MODE_USB_DEVICE,
>   PHY_MODE_USB_OTG,
>   PHY_MODE_SGMII,
> + PHY_MODE_SGMII_2_5G,
>   PHY_MODE_10GKR,
>   PHY_MODE_UFS_HS_A,
>   PHY_MODE_UFS_HS_B,

Hi Antoine

There was a discussion maybe last month about adding 2.5G SGMII. I
would prefer 2500SGMII. Putting the number first makes it uniform with
the other defines, 1000BASEX, 25000BASEX, 10GKR.

Andrew


Re: [PATCH] Staging: ipx: fixed several no space before tabs coding style issues

2017-12-27 Thread Greg KH
On Wed, Dec 27, 2017 at 09:25:44PM +, Jianshen Liu wrote:
> Fixed several coding style warnings of "please, no space before tabs".
> 
> Signed-off-by: Jianshen Liu 
> ---
>  drivers/staging/ipx/af_ipx.c| 56 
> -
>  drivers/staging/ipx/ipx_proc.c  |  2 +-
>  drivers/staging/ipx/ipx_route.c |  6 ++---
>  3 files changed, 32 insertions(+), 32 deletions(-)

Please read drivers/staging/ipx/TODO


Re: [PATCH] Staging: ipx: fixed several no space before tabs coding style issues

2017-12-27 Thread Greg KH
On Wed, Dec 27, 2017 at 09:25:44PM +, Jianshen Liu wrote:
> Fixed several coding style warnings of "please, no space before tabs".
> 
> Signed-off-by: Jianshen Liu 
> ---
>  drivers/staging/ipx/af_ipx.c| 56 
> -
>  drivers/staging/ipx/ipx_proc.c  |  2 +-
>  drivers/staging/ipx/ipx_route.c |  6 ++---
>  3 files changed, 32 insertions(+), 32 deletions(-)

Please read drivers/staging/ipx/TODO


Re: [PATCH 1/2] nvmem: add driver for JZ4780 efuse

2017-12-27 Thread Marcin Nowakowski

Hi Mathieu, PrasannaKumar,

On 27.12.2017 13:27, Mathieu Malaterre wrote:

From: PrasannaKumar Muralidharan 

This patch brings support for the JZ4780 efuse. Currently it only expose
a read only access to the entire 8K bits efuse memory.

Tested-by: Mathieu Malaterre 
Signed-off-by: PrasannaKumar Muralidharan 
---



+
+/* main entry point */
+static int jz4780_efuse_read(void *context, unsigned int offset,
+   void *val, size_t bytes)
+{
+   static const int nsegments = sizeof(segments) / sizeof(*segments);
+   struct jz4780_efuse *efuse = context;
+   char buf[32];
+   char *cur = val;
+   int i;
+   /* PM recommends read/write each segment separately */
+   for (i = 0; i < nsegments; ++i) {
+   unsigned int *segment = segments[i];
+   unsigned int lpos = segment[0];
+   unsigned int buflen = segment[1] / 8;
+   unsigned int ncount = buflen / 32;
+   unsigned int remain = buflen % 32;
+   int j;


This doesn't look right, as offset & bytes are completely ignored. This 
means it will return data from an offset other than requested and may 
also overrun the provided output buffer?



+   /* EFUSE can read or write maximum 256bit in each time */
+   for (j = 0; j < ncount ; ++j) {
+   jz4780_efuse_read_32bytes(efuse, buf, lpos);
+   memcpy(cur, buf, sizeof(buf));
+   cur += sizeof(buf);
+   lpos += sizeof(buf);
+   }
+   if (remain) {
+   jz4780_efuse_read_32bytes(efuse, buf, lpos);
+   memcpy(cur, buf, remain);
+   cur += remain;
+   }
+   }
+
+   return 0;
+}



Marcin




Re: [PATCH 1/2] nvmem: add driver for JZ4780 efuse

2017-12-27 Thread Marcin Nowakowski

Hi Mathieu, PrasannaKumar,

On 27.12.2017 13:27, Mathieu Malaterre wrote:

From: PrasannaKumar Muralidharan 

This patch brings support for the JZ4780 efuse. Currently it only expose
a read only access to the entire 8K bits efuse memory.

Tested-by: Mathieu Malaterre 
Signed-off-by: PrasannaKumar Muralidharan 
---



+
+/* main entry point */
+static int jz4780_efuse_read(void *context, unsigned int offset,
+   void *val, size_t bytes)
+{
+   static const int nsegments = sizeof(segments) / sizeof(*segments);
+   struct jz4780_efuse *efuse = context;
+   char buf[32];
+   char *cur = val;
+   int i;
+   /* PM recommends read/write each segment separately */
+   for (i = 0; i < nsegments; ++i) {
+   unsigned int *segment = segments[i];
+   unsigned int lpos = segment[0];
+   unsigned int buflen = segment[1] / 8;
+   unsigned int ncount = buflen / 32;
+   unsigned int remain = buflen % 32;
+   int j;


This doesn't look right, as offset & bytes are completely ignored. This 
means it will return data from an offset other than requested and may 
also overrun the provided output buffer?



+   /* EFUSE can read or write maximum 256bit in each time */
+   for (j = 0; j < ncount ; ++j) {
+   jz4780_efuse_read_32bytes(efuse, buf, lpos);
+   memcpy(cur, buf, sizeof(buf));
+   cur += sizeof(buf);
+   lpos += sizeof(buf);
+   }
+   if (remain) {
+   jz4780_efuse_read_32bytes(efuse, buf, lpos);
+   memcpy(cur, buf, remain);
+   cur += remain;
+   }
+   }
+
+   return 0;
+}



Marcin




Re: [PATCH] crypto: stm32 - Use standard CONFIG name

2017-12-27 Thread Herbert Xu
On Wed, Dec 20, 2017 at 06:19:32PM +, Corentin Labbe wrote:
> All hardware crypto devices have their CONFIG names using the following
> convention:
> CRYPTO_DEV_name_algo
> 
> This patch apply this conventions on STM32 CONFIG names.
> 
> Signed-off-by: Corentin Labbe 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] crypto: stm32 - Use standard CONFIG name

2017-12-27 Thread Herbert Xu
On Wed, Dec 20, 2017 at 06:19:32PM +, Corentin Labbe wrote:
> All hardware crypto devices have their CONFIG names using the following
> convention:
> CRYPTO_DEV_name_algo
> 
> This patch apply this conventions on STM32 CONFIG names.
> 
> Signed-off-by: Corentin Labbe 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH v2 0/6] crypto: tcrypt: fix and add multi buf speed tests

2017-12-27 Thread Herbert Xu
On Sun, Dec 17, 2017 at 08:28:59AM +, Gilad Ben-Yossef wrote:
> The performance of some crypto tfm providers is affected by
> the amount of parallelism possible with the processing.
> 
> We already had some support for speed test of multiple concurrent
> requests, dubbed multi buffer, in ahash speed tests.
>   
>   
> This patch set extends said support and add similar support for skcipher
> and AEAD, as well as fixes some odd bugs discovered along the way.
>   
>   
> It is noted that it is possible to consolidate some of the none multi
> buffer speed test code better, but given that tcrypt as a whole is a
> developer testing harness rather than production code, the value of
> this activity seems questionable. Do let me know if you disagree.
> 
> Signed-off-by: Gilad Ben-Yossef 

All applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] crypto: x86/twofish-3way - Fix %rbp usage

2017-12-27 Thread Herbert Xu
On Mon, Dec 18, 2017 at 04:40:26PM -0800, Eric Biggers wrote:
> From: Eric Biggers 
> 
> Using %rbp as a temporary register breaks frame pointer convention and
> breaks stack traces when unwinding from an interrupt in the crypto code.
> 
> In twofish-3way, we can't simply replace %rbp with another register
> because there are none available.  Instead, we use the stack to hold the
> values that %rbp, %r11, and %r12 were holding previously.  Each of these
> values represents the half of the output from the previous Feistel round
> that is being passed on unchanged to the following round.  They are only
> used once per round, when they are exchanged with %rax, %rbx, and %rcx.
> 
> As a result, we free up 3 registers (one per block) and can reassign
> them so that %rbp is not used, and additionally %r14 and %r15 are not
> used so they do not need to be saved/restored.
> 
> There may be a small overhead caused by replacing 'xchg REG, REG' with
> the needed sequence 'mov MEM, REG; mov REG, MEM; mov REG, REG' once per
> round.  But, counterintuitively, when I tested "ctr-twofish-3way" on a
> Haswell processor, the new version was actually about 2% faster.
> (Perhaps 'xchg' is not as well optimized as plain moves.)
> 
> Reported-by: syzbot 
> Signed-off-by: Eric Biggers 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH v2 0/6] crypto: tcrypt: fix and add multi buf speed tests

2017-12-27 Thread Herbert Xu
On Sun, Dec 17, 2017 at 08:28:59AM +, Gilad Ben-Yossef wrote:
> The performance of some crypto tfm providers is affected by
> the amount of parallelism possible with the processing.
> 
> We already had some support for speed test of multiple concurrent
> requests, dubbed multi buffer, in ahash speed tests.
>   
>   
> This patch set extends said support and add similar support for skcipher
> and AEAD, as well as fixes some odd bugs discovered along the way.
>   
>   
> It is noted that it is possible to consolidate some of the none multi
> buffer speed test code better, but given that tcrypt as a whole is a
> developer testing harness rather than production code, the value of
> this activity seems questionable. Do let me know if you disagree.
> 
> Signed-off-by: Gilad Ben-Yossef 

All applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] crypto: x86/twofish-3way - Fix %rbp usage

2017-12-27 Thread Herbert Xu
On Mon, Dec 18, 2017 at 04:40:26PM -0800, Eric Biggers wrote:
> From: Eric Biggers 
> 
> Using %rbp as a temporary register breaks frame pointer convention and
> breaks stack traces when unwinding from an interrupt in the crypto code.
> 
> In twofish-3way, we can't simply replace %rbp with another register
> because there are none available.  Instead, we use the stack to hold the
> values that %rbp, %r11, and %r12 were holding previously.  Each of these
> values represents the half of the output from the previous Feistel round
> that is being passed on unchanged to the following round.  They are only
> used once per round, when they are exchanged with %rax, %rbx, and %rcx.
> 
> As a result, we free up 3 registers (one per block) and can reassign
> them so that %rbp is not used, and additionally %r14 and %r15 are not
> used so they do not need to be saved/restored.
> 
> There may be a small overhead caused by replacing 'xchg REG, REG' with
> the needed sequence 'mov MEM, REG; mov REG, MEM; mov REG, REG' once per
> round.  But, counterintuitively, when I tested "ctr-twofish-3way" on a
> Haswell processor, the new version was actually about 2% faster.
> (Perhaps 'xchg' is not as well optimized as plain moves.)
> 
> Reported-by: syzbot 
> Signed-off-by: Eric Biggers 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


[PATCH net-next v9 1/2] dt-bindings: net: add DT bindings for Socionext UniPhier AVE

2017-12-27 Thread Kunihiko Hayashi
DT bindings for the AVE ethernet controller found on Socionext's
UniPhier platforms.

Signed-off-by: Kunihiko Hayashi 
Signed-off-by: Jassi Brar 
Acked-by: Rob Herring 
Reviewed-by: Florian Fainelli 
---
 .../bindings/net/socionext,uniphier-ave4.txt   | 48 ++
 1 file changed, 48 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt

diff --git a/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt 
b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
new file mode 100644
index 000..270ea4e
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
@@ -0,0 +1,48 @@
+* Socionext AVE ethernet controller
+
+This describes the devicetree bindings for AVE ethernet controller
+implemented on Socionext UniPhier SoCs.
+
+Required properties:
+ - compatible: Should be
+   - "socionext,uniphier-pro4-ave4" : for Pro4 SoC
+   - "socionext,uniphier-pxs2-ave4" : for PXs2 SoC
+   - "socionext,uniphier-ld11-ave4" : for LD11 SoC
+   - "socionext,uniphier-ld20-ave4" : for LD20 SoC
+ - reg: Address where registers are mapped and size of region.
+ - interrupts: Should contain the MAC interrupt.
+ - phy-mode: See ethernet.txt in the same directory. Allow to choose
+   "rgmii", "rmii", or "mii" according to the PHY.
+ - phy-handle: Should point to the external phy device.
+   See ethernet.txt file in the same directory.
+ - clocks: A phandle to the clock for the MAC.
+
+Optional properties:
+ - resets: A phandle to the reset control for the MAC.
+ - local-mac-address: See ethernet.txt in the same directory.
+
+Required subnode:
+ - mdio: A container for child nodes representing phy nodes.
+ See phy.txt in the same directory.
+
+Example:
+
+   ether: ethernet@6500 {
+   compatible = "socionext,uniphier-ld20-ave4";
+   reg = <0x6500 0x8500>;
+   interrupts = <0 66 4>;
+   phy-mode = "rgmii";
+   phy-handle = <>;
+   clocks = <_clk 6>;
+   resets = <_rst 6>;
+   local-mac-address = [00 00 00 00 00 00];
+
+   mdio {
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   ethphy: ethphy@1 {
+   reg = <1>;
+   };
+   };
+   };
-- 
2.7.4



[PATCH net-next v9 1/2] dt-bindings: net: add DT bindings for Socionext UniPhier AVE

2017-12-27 Thread Kunihiko Hayashi
DT bindings for the AVE ethernet controller found on Socionext's
UniPhier platforms.

Signed-off-by: Kunihiko Hayashi 
Signed-off-by: Jassi Brar 
Acked-by: Rob Herring 
Reviewed-by: Florian Fainelli 
---
 .../bindings/net/socionext,uniphier-ave4.txt   | 48 ++
 1 file changed, 48 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt

diff --git a/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt 
b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
new file mode 100644
index 000..270ea4e
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
@@ -0,0 +1,48 @@
+* Socionext AVE ethernet controller
+
+This describes the devicetree bindings for AVE ethernet controller
+implemented on Socionext UniPhier SoCs.
+
+Required properties:
+ - compatible: Should be
+   - "socionext,uniphier-pro4-ave4" : for Pro4 SoC
+   - "socionext,uniphier-pxs2-ave4" : for PXs2 SoC
+   - "socionext,uniphier-ld11-ave4" : for LD11 SoC
+   - "socionext,uniphier-ld20-ave4" : for LD20 SoC
+ - reg: Address where registers are mapped and size of region.
+ - interrupts: Should contain the MAC interrupt.
+ - phy-mode: See ethernet.txt in the same directory. Allow to choose
+   "rgmii", "rmii", or "mii" according to the PHY.
+ - phy-handle: Should point to the external phy device.
+   See ethernet.txt file in the same directory.
+ - clocks: A phandle to the clock for the MAC.
+
+Optional properties:
+ - resets: A phandle to the reset control for the MAC.
+ - local-mac-address: See ethernet.txt in the same directory.
+
+Required subnode:
+ - mdio: A container for child nodes representing phy nodes.
+ See phy.txt in the same directory.
+
+Example:
+
+   ether: ethernet@6500 {
+   compatible = "socionext,uniphier-ld20-ave4";
+   reg = <0x6500 0x8500>;
+   interrupts = <0 66 4>;
+   phy-mode = "rgmii";
+   phy-handle = <>;
+   clocks = <_clk 6>;
+   resets = <_rst 6>;
+   local-mac-address = [00 00 00 00 00 00];
+
+   mdio {
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   ethphy: ethphy@1 {
+   reg = <1>;
+   };
+   };
+   };
-- 
2.7.4



[PATCH net-next v9 2/2] net: ethernet: socionext: add AVE ethernet driver

2017-12-27 Thread Kunihiko Hayashi
The UniPhier platform from Socionext provides the AVE ethernet
controller that includes MAC and MDIO bus supporting RGMII/RMII
modes. The controller is named AVE.

Signed-off-by: Kunihiko Hayashi 
Signed-off-by: Jassi Brar 
Reviewed-by: Andrew Lunn 
---
 drivers/net/ethernet/Kconfig |1 +
 drivers/net/ethernet/Makefile|1 +
 drivers/net/ethernet/socionext/Kconfig   |   22 +
 drivers/net/ethernet/socionext/Makefile  |5 +
 drivers/net/ethernet/socionext/sni_ave.c | 1736 ++
 5 files changed, 1765 insertions(+)
 create mode 100644 drivers/net/ethernet/socionext/Kconfig
 create mode 100644 drivers/net/ethernet/socionext/Makefile
 create mode 100644 drivers/net/ethernet/socionext/sni_ave.c

diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index c604213..d50519e 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -170,6 +170,7 @@ source "drivers/net/ethernet/sis/Kconfig"
 source "drivers/net/ethernet/sfc/Kconfig"
 source "drivers/net/ethernet/sgi/Kconfig"
 source "drivers/net/ethernet/smsc/Kconfig"
+source "drivers/net/ethernet/socionext/Kconfig"
 source "drivers/net/ethernet/stmicro/Kconfig"
 source "drivers/net/ethernet/sun/Kconfig"
 source "drivers/net/ethernet/tehuti/Kconfig"
diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
index 39f62733..6cf5ade 100644
--- a/drivers/net/ethernet/Makefile
+++ b/drivers/net/ethernet/Makefile
@@ -82,6 +82,7 @@ obj-$(CONFIG_SFC) += sfc/
 obj-$(CONFIG_SFC_FALCON) += sfc/falcon/
 obj-$(CONFIG_NET_VENDOR_SGI) += sgi/
 obj-$(CONFIG_NET_VENDOR_SMSC) += smsc/
+obj-$(CONFIG_NET_VENDOR_SOCIONEXT) += socionext/
 obj-$(CONFIG_NET_VENDOR_STMICRO) += stmicro/
 obj-$(CONFIG_NET_VENDOR_SUN) += sun/
 obj-$(CONFIG_NET_VENDOR_TEHUTI) += tehuti/
diff --git a/drivers/net/ethernet/socionext/Kconfig 
b/drivers/net/ethernet/socionext/Kconfig
new file mode 100644
index 000..3a1829e
--- /dev/null
+++ b/drivers/net/ethernet/socionext/Kconfig
@@ -0,0 +1,22 @@
+config NET_VENDOR_SOCIONEXT
+   bool "Socionext ethernet drivers"
+   default y
+   ---help---
+ Option to select ethernet drivers for Socionext platforms.
+
+ Note that the answer to this question doesn't directly affect the
+ kernel: saying N will just cause the configurator to skip all
+ the questions about Socionext devices. If you say Y, you will be asked
+ for your specific card in the following questions.
+
+if NET_VENDOR_SOCIONEXT
+
+config SNI_AVE
+   tristate "Socionext AVE ethernet support"
+   depends on (ARCH_UNIPHIER || COMPILE_TEST) && OF
+   select PHYLIB
+   ---help---
+ Driver for gigabit ethernet MACs, called AVE, in the
+ Socionext UniPhier family.
+
+endif #NET_VENDOR_SOCIONEXT
diff --git a/drivers/net/ethernet/socionext/Makefile 
b/drivers/net/ethernet/socionext/Makefile
new file mode 100644
index 000..ab83df6
--- /dev/null
+++ b/drivers/net/ethernet/socionext/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Makefile for all ethernet ip drivers on Socionext platforms
+#
+obj-$(CONFIG_SNI_AVE) += sni_ave.o
diff --git a/drivers/net/ethernet/socionext/sni_ave.c 
b/drivers/net/ethernet/socionext/sni_ave.c
new file mode 100644
index 000..111e7ca
--- /dev/null
+++ b/drivers/net/ethernet/socionext/sni_ave.c
@@ -0,0 +1,1736 @@
+// SPDX-License-Identifier: GPL-2.0
+/**
+ * sni_ave.c - Socionext UniPhier AVE ethernet driver
+ * Copyright 2014 Panasonic Corporation
+ * Copyright 2015-2017 Socionext Inc.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* General Register Group */
+#define AVE_IDR0x000   /* ID */
+#define AVE_VR 0x004   /* Version */
+#define AVE_GRR0x008   /* Global Reset */
+#define AVE_CFGR   0x00c   /* Configuration */
+
+/* Interrupt Register Group */
+#define AVE_GIMR   0x100   /* Global Interrupt Mask */
+#define AVE_GISR   0x104   /* Global Interrupt Status */
+
+/* MAC Register Group */
+#define AVE_TXCR   0x200   /* TX Setup */
+#define AVE_RXCR   0x204   /* RX Setup */
+#define AVE_RXMAC1R0x208   /* MAC address (lower) */
+#define AVE_RXMAC2R0x20c   /* MAC address (upper) */
+#define AVE_MDIOCTR0x214   /* MDIO Control */
+#define AVE_MDIOAR 0x218   /* MDIO Address */
+#define AVE_MDIOWDR0x21c   /* MDIO Data */
+#define AVE_MDIOSR 0x220   /* MDIO Status */
+#define AVE_MDIORDR0x224   /* MDIO Rd Data */
+
+/* Descriptor Control Register Group */
+#define AVE_DESCC  0x300   /* Descriptor Control */
+#define AVE_TXDC   0x304   

[PATCH net-next v9 2/2] net: ethernet: socionext: add AVE ethernet driver

2017-12-27 Thread Kunihiko Hayashi
The UniPhier platform from Socionext provides the AVE ethernet
controller that includes MAC and MDIO bus supporting RGMII/RMII
modes. The controller is named AVE.

Signed-off-by: Kunihiko Hayashi 
Signed-off-by: Jassi Brar 
Reviewed-by: Andrew Lunn 
---
 drivers/net/ethernet/Kconfig |1 +
 drivers/net/ethernet/Makefile|1 +
 drivers/net/ethernet/socionext/Kconfig   |   22 +
 drivers/net/ethernet/socionext/Makefile  |5 +
 drivers/net/ethernet/socionext/sni_ave.c | 1736 ++
 5 files changed, 1765 insertions(+)
 create mode 100644 drivers/net/ethernet/socionext/Kconfig
 create mode 100644 drivers/net/ethernet/socionext/Makefile
 create mode 100644 drivers/net/ethernet/socionext/sni_ave.c

diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index c604213..d50519e 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -170,6 +170,7 @@ source "drivers/net/ethernet/sis/Kconfig"
 source "drivers/net/ethernet/sfc/Kconfig"
 source "drivers/net/ethernet/sgi/Kconfig"
 source "drivers/net/ethernet/smsc/Kconfig"
+source "drivers/net/ethernet/socionext/Kconfig"
 source "drivers/net/ethernet/stmicro/Kconfig"
 source "drivers/net/ethernet/sun/Kconfig"
 source "drivers/net/ethernet/tehuti/Kconfig"
diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
index 39f62733..6cf5ade 100644
--- a/drivers/net/ethernet/Makefile
+++ b/drivers/net/ethernet/Makefile
@@ -82,6 +82,7 @@ obj-$(CONFIG_SFC) += sfc/
 obj-$(CONFIG_SFC_FALCON) += sfc/falcon/
 obj-$(CONFIG_NET_VENDOR_SGI) += sgi/
 obj-$(CONFIG_NET_VENDOR_SMSC) += smsc/
+obj-$(CONFIG_NET_VENDOR_SOCIONEXT) += socionext/
 obj-$(CONFIG_NET_VENDOR_STMICRO) += stmicro/
 obj-$(CONFIG_NET_VENDOR_SUN) += sun/
 obj-$(CONFIG_NET_VENDOR_TEHUTI) += tehuti/
diff --git a/drivers/net/ethernet/socionext/Kconfig 
b/drivers/net/ethernet/socionext/Kconfig
new file mode 100644
index 000..3a1829e
--- /dev/null
+++ b/drivers/net/ethernet/socionext/Kconfig
@@ -0,0 +1,22 @@
+config NET_VENDOR_SOCIONEXT
+   bool "Socionext ethernet drivers"
+   default y
+   ---help---
+ Option to select ethernet drivers for Socionext platforms.
+
+ Note that the answer to this question doesn't directly affect the
+ kernel: saying N will just cause the configurator to skip all
+ the questions about Socionext devices. If you say Y, you will be asked
+ for your specific card in the following questions.
+
+if NET_VENDOR_SOCIONEXT
+
+config SNI_AVE
+   tristate "Socionext AVE ethernet support"
+   depends on (ARCH_UNIPHIER || COMPILE_TEST) && OF
+   select PHYLIB
+   ---help---
+ Driver for gigabit ethernet MACs, called AVE, in the
+ Socionext UniPhier family.
+
+endif #NET_VENDOR_SOCIONEXT
diff --git a/drivers/net/ethernet/socionext/Makefile 
b/drivers/net/ethernet/socionext/Makefile
new file mode 100644
index 000..ab83df6
--- /dev/null
+++ b/drivers/net/ethernet/socionext/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Makefile for all ethernet ip drivers on Socionext platforms
+#
+obj-$(CONFIG_SNI_AVE) += sni_ave.o
diff --git a/drivers/net/ethernet/socionext/sni_ave.c 
b/drivers/net/ethernet/socionext/sni_ave.c
new file mode 100644
index 000..111e7ca
--- /dev/null
+++ b/drivers/net/ethernet/socionext/sni_ave.c
@@ -0,0 +1,1736 @@
+// SPDX-License-Identifier: GPL-2.0
+/**
+ * sni_ave.c - Socionext UniPhier AVE ethernet driver
+ * Copyright 2014 Panasonic Corporation
+ * Copyright 2015-2017 Socionext Inc.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* General Register Group */
+#define AVE_IDR0x000   /* ID */
+#define AVE_VR 0x004   /* Version */
+#define AVE_GRR0x008   /* Global Reset */
+#define AVE_CFGR   0x00c   /* Configuration */
+
+/* Interrupt Register Group */
+#define AVE_GIMR   0x100   /* Global Interrupt Mask */
+#define AVE_GISR   0x104   /* Global Interrupt Status */
+
+/* MAC Register Group */
+#define AVE_TXCR   0x200   /* TX Setup */
+#define AVE_RXCR   0x204   /* RX Setup */
+#define AVE_RXMAC1R0x208   /* MAC address (lower) */
+#define AVE_RXMAC2R0x20c   /* MAC address (upper) */
+#define AVE_MDIOCTR0x214   /* MDIO Control */
+#define AVE_MDIOAR 0x218   /* MDIO Address */
+#define AVE_MDIOWDR0x21c   /* MDIO Data */
+#define AVE_MDIOSR 0x220   /* MDIO Status */
+#define AVE_MDIORDR0x224   /* MDIO Rd Data */
+
+/* Descriptor Control Register Group */
+#define AVE_DESCC  0x300   /* Descriptor Control */
+#define AVE_TXDC   0x304   /* TX Descriptor Configuration */
+#define AVE_RXDC0  0x308   /* 

[PATCH net-next v9 0/2] add UniPhier AVE ethernet support

2017-12-27 Thread Kunihiko Hayashi
This series adds support for Socionext AVE ethernet controller implemented
on UniPhier SoCs. This driver supports RGMII/RMII modes.

v8: https://www.spinics.net/lists/netdev/msg474374.html

The PHY patch included in v1 has already separated in:
http://www.spinics.net/lists/netdev/msg454595.html

Changes since v8:
- move operators at the beginning of the line to the end of the previous line
- dt-bindings: add blank lines before mdio and phy subnodes

Changes since v7:
- dt-bindings: fix mdio subnode description

Changes since v6:
- sort the order of local variables from longest to shortest line
- fix ave_probe() which calls register_netdev() at the end of initialization
- dt-bindings: remove phy node descriptions in mdio node

Changes since v5:
- replace license boilerplate with SPDX Identifier
- remove inline directives and an unused function

Changes since v4:
- fix larger integer warning on AVE_PFMBYTE_MASK0

Changes since v3:
- remove checking dma address and use dma_set_mask() to restirct address
- replace ave_mdio_busywait() with read_poll_timeout()
- replace functions to access to registers with readl/writel() directly
- replace a function to access to macaddr with ave_hw_write_macaddr()
- change return value of ave_dma_map() to error value
- move mdiobus_unregister() from ave_remove() to ave_uninit()
- eliminate else block at the end of ave_dma_map()
- add mask definitions for packet filter
- sort bitmap definitions in descending order
- add error check to some functions
- rename and sort functions to clear sub-categories
- fix error value consistency
- remove unneeded initializers
- change type of constant arrays

Changes since v2:
- replace clk_get() with devm_clk_get()
- replace reset_control_get() with devm_reset_control_get_optional_shared()
- add error return when the error occurs on the above *_get functions
- sort soc data and compatible strings
- remove clearly obvious comments
- modify dt-bindings document consistent with these modifications

Changes since v1:
- add/remove devicetree properties and sub-node
  - remove "internal-phy-interrupt" and "desc-bits" property
  - add SoC data structures based on compatible strings
  - add node operation to apply "mdio" sub-node
- add support for features
  - add support for {get,set}_pauseparam and pause frame operations
  - add support for ndo_get_stats64 instead of ndo_get_stats
- replace with desiable functions
  - replace check for valid phy_mode with phy_interface{_mode}_is_rgmii()
  - replace phy attach message with phy_attached_info()
  - replace 32bit operation with {upper,lower}_32_bits() on ave_wdesc_addr()
  - replace nway_reset and get_link with generic functions
- move operations to proper functions
  - move phy_start_aneg() to ndo_open,
and remove unnecessary PHY interrupt operations
See http://www.spinics.net/lists/netdev/msg454590.html
  - move irq initialization and descriptor memory allocation to ndo_open
  - move initialization of reset and clock and mdiobus to ndo_init
- fix skbuffer operations
  - fix skb alignment operations and add Rx buffer adjustment for descriptor
See http://www.spinics.net/lists/netdev/msg456014.html
  - add error returns when dma_map_single() failed 
- clean up code structures
  - clean up wait-loop and wake-queue conditions
  - add ave_wdesc_addr() and offset definitions
  - add ave_macaddr_init() to clean up mac-address operation
  - fix checking whether Tx entry is not enough
  - fix supported features of phydev
  - add necessary free/disable operations
  - add phydev check on ave_{get,set}_wol()
  - remove netif_carrier functions, phydev initializer, and Tx budget check
- change obsolate codes
  - replace ndev->{base_addr,irq} with the members of ave_private
- rename goto labels and mask definitions, and remove unused codes

Kunihiko Hayashi (2):
  dt-bindings: net: add DT bindings for Socionext UniPhier AVE
  net: ethernet: socionext: add AVE ethernet driver

 .../bindings/net/socionext,uniphier-ave4.txt   |   48 +
 drivers/net/ethernet/Kconfig   |1 +
 drivers/net/ethernet/Makefile  |1 +
 drivers/net/ethernet/socionext/Kconfig |   22 +
 drivers/net/ethernet/socionext/Makefile|5 +
 drivers/net/ethernet/socionext/sni_ave.c   | 1736 
 6 files changed, 1813 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
 create mode 100644 drivers/net/ethernet/socionext/Kconfig
 create mode 100644 drivers/net/ethernet/socionext/Makefile
 create mode 100644 drivers/net/ethernet/socionext/sni_ave.c

-- 
2.7.4



[PATCH net-next v9 0/2] add UniPhier AVE ethernet support

2017-12-27 Thread Kunihiko Hayashi
This series adds support for Socionext AVE ethernet controller implemented
on UniPhier SoCs. This driver supports RGMII/RMII modes.

v8: https://www.spinics.net/lists/netdev/msg474374.html

The PHY patch included in v1 has already separated in:
http://www.spinics.net/lists/netdev/msg454595.html

Changes since v8:
- move operators at the beginning of the line to the end of the previous line
- dt-bindings: add blank lines before mdio and phy subnodes

Changes since v7:
- dt-bindings: fix mdio subnode description

Changes since v6:
- sort the order of local variables from longest to shortest line
- fix ave_probe() which calls register_netdev() at the end of initialization
- dt-bindings: remove phy node descriptions in mdio node

Changes since v5:
- replace license boilerplate with SPDX Identifier
- remove inline directives and an unused function

Changes since v4:
- fix larger integer warning on AVE_PFMBYTE_MASK0

Changes since v3:
- remove checking dma address and use dma_set_mask() to restirct address
- replace ave_mdio_busywait() with read_poll_timeout()
- replace functions to access to registers with readl/writel() directly
- replace a function to access to macaddr with ave_hw_write_macaddr()
- change return value of ave_dma_map() to error value
- move mdiobus_unregister() from ave_remove() to ave_uninit()
- eliminate else block at the end of ave_dma_map()
- add mask definitions for packet filter
- sort bitmap definitions in descending order
- add error check to some functions
- rename and sort functions to clear sub-categories
- fix error value consistency
- remove unneeded initializers
- change type of constant arrays

Changes since v2:
- replace clk_get() with devm_clk_get()
- replace reset_control_get() with devm_reset_control_get_optional_shared()
- add error return when the error occurs on the above *_get functions
- sort soc data and compatible strings
- remove clearly obvious comments
- modify dt-bindings document consistent with these modifications

Changes since v1:
- add/remove devicetree properties and sub-node
  - remove "internal-phy-interrupt" and "desc-bits" property
  - add SoC data structures based on compatible strings
  - add node operation to apply "mdio" sub-node
- add support for features
  - add support for {get,set}_pauseparam and pause frame operations
  - add support for ndo_get_stats64 instead of ndo_get_stats
- replace with desiable functions
  - replace check for valid phy_mode with phy_interface{_mode}_is_rgmii()
  - replace phy attach message with phy_attached_info()
  - replace 32bit operation with {upper,lower}_32_bits() on ave_wdesc_addr()
  - replace nway_reset and get_link with generic functions
- move operations to proper functions
  - move phy_start_aneg() to ndo_open,
and remove unnecessary PHY interrupt operations
See http://www.spinics.net/lists/netdev/msg454590.html
  - move irq initialization and descriptor memory allocation to ndo_open
  - move initialization of reset and clock and mdiobus to ndo_init
- fix skbuffer operations
  - fix skb alignment operations and add Rx buffer adjustment for descriptor
See http://www.spinics.net/lists/netdev/msg456014.html
  - add error returns when dma_map_single() failed 
- clean up code structures
  - clean up wait-loop and wake-queue conditions
  - add ave_wdesc_addr() and offset definitions
  - add ave_macaddr_init() to clean up mac-address operation
  - fix checking whether Tx entry is not enough
  - fix supported features of phydev
  - add necessary free/disable operations
  - add phydev check on ave_{get,set}_wol()
  - remove netif_carrier functions, phydev initializer, and Tx budget check
- change obsolate codes
  - replace ndev->{base_addr,irq} with the members of ave_private
- rename goto labels and mask definitions, and remove unused codes

Kunihiko Hayashi (2):
  dt-bindings: net: add DT bindings for Socionext UniPhier AVE
  net: ethernet: socionext: add AVE ethernet driver

 .../bindings/net/socionext,uniphier-ave4.txt   |   48 +
 drivers/net/ethernet/Kconfig   |1 +
 drivers/net/ethernet/Makefile  |1 +
 drivers/net/ethernet/socionext/Kconfig |   22 +
 drivers/net/ethernet/socionext/Makefile|5 +
 drivers/net/ethernet/socionext/sni_ave.c   | 1736 
 6 files changed, 1813 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
 create mode 100644 drivers/net/ethernet/socionext/Kconfig
 create mode 100644 drivers/net/ethernet/socionext/Makefile
 create mode 100644 drivers/net/ethernet/socionext/sni_ave.c

-- 
2.7.4



Re: [PATCH] padata: add SPDX identifier

2017-12-27 Thread Herbert Xu
Cheah Kok Cheong  wrote:
> Add SPDX license identifier according to the type of license text found
> in the file.
> 
> Cc: Philippe Ombredanne 
> Signed-off-by: Cheah Kok Cheong 
> ---
> kernel/padata.c | 1 +
> 1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/padata.c b/kernel/padata.c
> index e265953..eb9a9d9 100644
> --- a/kernel/padata.c
> +++ b/kernel/padata.c
> @@ -1,3 +1,4 @@
> +// SPDX-License-Identifier: GPL-2.0
> /*
>  * padata.c - generic interface to process data streams in parallel
>  *

Steffen, are you OK with this patch?

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] padata: add SPDX identifier

2017-12-27 Thread Herbert Xu
Cheah Kok Cheong  wrote:
> Add SPDX license identifier according to the type of license text found
> in the file.
> 
> Cc: Philippe Ombredanne 
> Signed-off-by: Cheah Kok Cheong 
> ---
> kernel/padata.c | 1 +
> 1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/padata.c b/kernel/padata.c
> index e265953..eb9a9d9 100644
> --- a/kernel/padata.c
> +++ b/kernel/padata.c
> @@ -1,3 +1,4 @@
> +// SPDX-License-Identifier: GPL-2.0
> /*
>  * padata.c - generic interface to process data streams in parallel
>  *

Steffen, are you OK with this patch?

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-27 Thread Sergey Senozhatsky

Hello,

On (12/21/17 23:19), Steven Rostedt wrote:
[..]
> > I wrote this before but this isn't a theoretical problem.  We see
> > these stalls a lot.  Preemption isn't enabled to begin with.  Memory
> > pressure is high and OOM triggers and printk starts printing out OOM
> > warnings; then, a network packet comes in which triggers allocations
> > in the network layer, which fails due to memory pressure, which then
> > generates memory allocation failure messages which then generates
> > netconsole packets which then tries to allocate more memory and so on.
> 
> It doesn't matter if preemption is enabled or not. The hand off should
> happen either way.

preemption does matter. it matters so much, that it makes the first
thing your patch depends on to be questionable - if CPUA stuck
   in console_unlock() printing other CPU's messages, it's
   because there are currently printk()-s happening on CPUB-CPUZ.

which is not always true. with preemption enabled CPUA can stuck in
console_unlock() because printk()-s from CPUB-CPUZ could have happened
seconds or even minutes ago. I have demonstrated it with the traces; it
didn't convenience you. OK, I sat down and went trough Tetsuo's reports
starting from 2016:

https://marc.info/?l=linux-kernel=151375384500555

and Tetsuo has reported several times that printing CPU can sleep for
minutes with console_sem being locked; while other CPUs are happy to
printk()->log_store() as much as they want. and that's why I believe
that that "The hand off should happen either way" is dangerous, especially
when we hand off from a preemptible context to an atomic context. you
don't think that this is a problem, because your patch requires logbuf to
be small enough for any atomic context (irq, under spin_lock, etc) to be
able to print out all the pending messages to all registered consoles [we
print to consoles sequentially] within watchdog_threshold seconds (10
seconds by default). who does adjust the size of the logbuf in a way that
console_unlock() can print the entire logbuf under 10 seconds?


> > It's just that there's no one else to give that flushing duty too, so
> > the pingpoinging that your patch implements can't really help
> > anything.
> > 
> > You argue that it isn't made worse by your patch, which may be true
> > but your patch doesn't solve actual problems and is most likely
> > unnecessary complication which gets in the way for the actual
> > solution.  It's a weird argument to push an approach which is
> > fundamentally broken.  Let's please stop that.
> 
> BULLSHIT!
> 
> It's not a complex solution, and coming from the cgroup and workqueue
> maintainer, that's a pretty ironic comment.
> 
> It has already been proven that it can solve problems:
> 
>   http://lkml.kernel.org/r/20171219143147.gb15...@dhcp22.suse.cz

and Tetsuo also said that his test does not cover all the scenarios;
and he has sent us all a pretty clear warning:

 : Therefore, while it is true that any approach would survive my
 : environment, it is dangerous to assume that any approach is safe
 : for my customer's enterprise servers.

https://marc.info/?l=linux-kernel=151377161705573


when it comes to lockups, printk() design is so flexible, that one can
justify nearly every patch no matter how many regressions it introduces,
by simply saying:
"but the same lockup scenario could have happened even before my
 patch. it's just printk() was designed this way. you were lucky
 enough not to hit the problem before; now you are less lucky".

been there, done that. it's a trap.

> You don't think handing off printks to an offloaded thread isn't more
> complex nor can it cause even more issues (like more likely to lose
> relevant information on kernel crashes)?

printk_kthread *used* to be way too independent. basically what we had
before was

bad_function()
 printk()
  vprintk_emit()
  {
if (oops_in_progress)
   can_offload_to_printk = false;

if (can_offload_to_printk)
   wake_up(printk_kthread);
else if (console_trylock())
   console_unlock();
  }


the bad things:

- first, we do not take into account the fact that printk_kthread can
  already be active.

- second, we do not take into account the fact that printk_kthread can
  be way-way behind - a huge gap between `console_seq' and `log_next_seq'.

- third, we do not take into account the fact that printk_kthread can
  be preempted under console_sem.

so, IOW, the CPU which was in trouble declared the "oh, we should not
offload to printk kthread" emergency only for itself, basically. the CPU
which printk_kthread was running on or sleeping on [preemption] did not
care, neither did printk_kthread.

what we have now:

- printk_kthread cannot be preempted under console_sem. if it acquires
  the console_sem it stays RUNNING, printing the messages.

- printk_kthread does check for emergency on other CPUs. right 

Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

2017-12-27 Thread Sergey Senozhatsky

Hello,

On (12/21/17 23:19), Steven Rostedt wrote:
[..]
> > I wrote this before but this isn't a theoretical problem.  We see
> > these stalls a lot.  Preemption isn't enabled to begin with.  Memory
> > pressure is high and OOM triggers and printk starts printing out OOM
> > warnings; then, a network packet comes in which triggers allocations
> > in the network layer, which fails due to memory pressure, which then
> > generates memory allocation failure messages which then generates
> > netconsole packets which then tries to allocate more memory and so on.
> 
> It doesn't matter if preemption is enabled or not. The hand off should
> happen either way.

preemption does matter. it matters so much, that it makes the first
thing your patch depends on to be questionable - if CPUA stuck
   in console_unlock() printing other CPU's messages, it's
   because there are currently printk()-s happening on CPUB-CPUZ.

which is not always true. with preemption enabled CPUA can stuck in
console_unlock() because printk()-s from CPUB-CPUZ could have happened
seconds or even minutes ago. I have demonstrated it with the traces; it
didn't convenience you. OK, I sat down and went trough Tetsuo's reports
starting from 2016:

https://marc.info/?l=linux-kernel=151375384500555

and Tetsuo has reported several times that printing CPU can sleep for
minutes with console_sem being locked; while other CPUs are happy to
printk()->log_store() as much as they want. and that's why I believe
that that "The hand off should happen either way" is dangerous, especially
when we hand off from a preemptible context to an atomic context. you
don't think that this is a problem, because your patch requires logbuf to
be small enough for any atomic context (irq, under spin_lock, etc) to be
able to print out all the pending messages to all registered consoles [we
print to consoles sequentially] within watchdog_threshold seconds (10
seconds by default). who does adjust the size of the logbuf in a way that
console_unlock() can print the entire logbuf under 10 seconds?


> > It's just that there's no one else to give that flushing duty too, so
> > the pingpoinging that your patch implements can't really help
> > anything.
> > 
> > You argue that it isn't made worse by your patch, which may be true
> > but your patch doesn't solve actual problems and is most likely
> > unnecessary complication which gets in the way for the actual
> > solution.  It's a weird argument to push an approach which is
> > fundamentally broken.  Let's please stop that.
> 
> BULLSHIT!
> 
> It's not a complex solution, and coming from the cgroup and workqueue
> maintainer, that's a pretty ironic comment.
> 
> It has already been proven that it can solve problems:
> 
>   http://lkml.kernel.org/r/20171219143147.gb15...@dhcp22.suse.cz

and Tetsuo also said that his test does not cover all the scenarios;
and he has sent us all a pretty clear warning:

 : Therefore, while it is true that any approach would survive my
 : environment, it is dangerous to assume that any approach is safe
 : for my customer's enterprise servers.

https://marc.info/?l=linux-kernel=151377161705573


when it comes to lockups, printk() design is so flexible, that one can
justify nearly every patch no matter how many regressions it introduces,
by simply saying:
"but the same lockup scenario could have happened even before my
 patch. it's just printk() was designed this way. you were lucky
 enough not to hit the problem before; now you are less lucky".

been there, done that. it's a trap.

> You don't think handing off printks to an offloaded thread isn't more
> complex nor can it cause even more issues (like more likely to lose
> relevant information on kernel crashes)?

printk_kthread *used* to be way too independent. basically what we had
before was

bad_function()
 printk()
  vprintk_emit()
  {
if (oops_in_progress)
   can_offload_to_printk = false;

if (can_offload_to_printk)
   wake_up(printk_kthread);
else if (console_trylock())
   console_unlock();
  }


the bad things:

- first, we do not take into account the fact that printk_kthread can
  already be active.

- second, we do not take into account the fact that printk_kthread can
  be way-way behind - a huge gap between `console_seq' and `log_next_seq'.

- third, we do not take into account the fact that printk_kthread can
  be preempted under console_sem.

so, IOW, the CPU which was in trouble declared the "oh, we should not
offload to printk kthread" emergency only for itself, basically. the CPU
which printk_kthread was running on or sleeping on [preemption] did not
care, neither did printk_kthread.

what we have now:

- printk_kthread cannot be preempted under console_sem. if it acquires
  the console_sem it stays RUNNING, printing the messages.

- printk_kthread does check for emergency on other CPUs. right 

RE: [PATCH v2] ftrace/module: Move ftrace_release_mod to ddebug_cleanup label

2017-12-27 Thread NAMIT GUPTA
 
Hello Steve,

As per the discussion I sent patch v2.

Please find.

Thanks,
Namit Gupta


- Original Message -
Sender : NAMIT GUPTA  ./Senior Chief 
Engineer/SRI-Delhi-Platform S/W 1 Team/Samsung Electronics
Date   : 2017-12-12 17:29 (GMT+5:30)
Title  : [PATCH v2] ftrace/module: Move ftrace_release_mod to ddebug_cleanup 
label
 
ftrace_module_init happen after dynamic_debug_setup, it is desired that
cleanup should be called after this label however in current implementation
it is called in free module label,ie:even though ftrace in not initialized,
from so many fail case ftrace_release_mod() will be called and unnecessary
traverse the whole list.
In below patch we moved ftrace_release_mod() from free_module label to
ddebug_cleanup label. that is the best possible location, other solution
is to make new label to ftrace_release_mod() but since ftrace_module_init()
is not return with minimum changes it should be in ddebug_cleanup label.
 
 
Signed-off-by: Namit Gupta 
 
---
 kernel/module.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)
 
diff --git a/kernel/module.c b/kernel/module.c
index 0d1cb8d..4be966a 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3523,6 +3523,7 @@ static int load_module(struct load_info *info, const char 
__user *uargs,
 unset_module_core_ro_nx(mod);
 
  ddebug_cleanup:
+ftrace_release_mod(mod);
 dynamic_debug_remove(info->debug);
 synchronize_sched();
 kfree(mod->args);
@@ -3541,12 +3542,6 @@ static int load_module(struct load_info *info, const 
char __user *uargs,
 synchronize_rcu();
 mutex_unlock(_mutex);
  free_module:
-/*
- * Ftrace needs to clean up what it initialized.
- * This does nothing if ftrace_module_init() wasn't called,
- * but it must be called outside of module_mutex.
- */
-ftrace_release_mod(mod);
 /* Free lock-classes; relies on the preceding sync_rcu() */
 lockdep_free_key_range(mod->module_core, mod->core_size);
 
-- 
1.9.1
 
 


RE: [PATCH v2] ftrace/module: Move ftrace_release_mod to ddebug_cleanup label

2017-12-27 Thread NAMIT GUPTA
 
Hello Steve,

As per the discussion I sent patch v2.

Please find.

Thanks,
Namit Gupta


- Original Message -
Sender : NAMIT GUPTA  ./Senior Chief 
Engineer/SRI-Delhi-Platform S/W 1 Team/Samsung Electronics
Date   : 2017-12-12 17:29 (GMT+5:30)
Title  : [PATCH v2] ftrace/module: Move ftrace_release_mod to ddebug_cleanup 
label
 
ftrace_module_init happen after dynamic_debug_setup, it is desired that
cleanup should be called after this label however in current implementation
it is called in free module label,ie:even though ftrace in not initialized,
from so many fail case ftrace_release_mod() will be called and unnecessary
traverse the whole list.
In below patch we moved ftrace_release_mod() from free_module label to
ddebug_cleanup label. that is the best possible location, other solution
is to make new label to ftrace_release_mod() but since ftrace_module_init()
is not return with minimum changes it should be in ddebug_cleanup label.
 
 
Signed-off-by: Namit Gupta 
 
---
 kernel/module.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)
 
diff --git a/kernel/module.c b/kernel/module.c
index 0d1cb8d..4be966a 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3523,6 +3523,7 @@ static int load_module(struct load_info *info, const char 
__user *uargs,
 unset_module_core_ro_nx(mod);
 
  ddebug_cleanup:
+ftrace_release_mod(mod);
 dynamic_debug_remove(info->debug);
 synchronize_sched();
 kfree(mod->args);
@@ -3541,12 +3542,6 @@ static int load_module(struct load_info *info, const 
char __user *uargs,
 synchronize_rcu();
 mutex_unlock(_mutex);
  free_module:
-/*
- * Ftrace needs to clean up what it initialized.
- * This does nothing if ftrace_module_init() wasn't called,
- * but it must be called outside of module_mutex.
- */
-ftrace_release_mod(mod);
 /* Free lock-classes; relies on the preceding sync_rcu() */
 lockdep_free_key_range(mod->module_core, mod->core_size);
 
-- 
1.9.1
 
 


[PATCH net-next v6 5/6] net: dccp: Add DCCP sendmsg trace event

2017-12-27 Thread Masami Hiramatsu
Add DCCP sendmsg trace event (dccp/dccp_probe) for
replacing dccpprobe. User can trace this event via
ftrace or perftools.

Signed-off-by: Masami Hiramatsu 
---
  Changes in v5
   - Fix to add local directory to include for trace.h.
 Thanks Steven!
---
 net/dccp/Makefile |3 ++
 net/dccp/proto.c  |5 +++
 net/dccp/trace.h  |  105 +
 3 files changed, 113 insertions(+)
 create mode 100644 net/dccp/trace.h

diff --git a/net/dccp/Makefile b/net/dccp/Makefile
index 2e7b56097bc4..4215f13a63af 100644
--- a/net/dccp/Makefile
+++ b/net/dccp/Makefile
@@ -27,3 +27,6 @@ dccp-$(CONFIG_SYSCTL) += sysctl.o
 
 dccp_diag-y := diag.o
 dccp_probe-y := probe.o
+
+# build with local directory for trace.h
+CFLAGS_proto.o := -I$(src)
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index 7a75a1d3568b..fa7e92e08920 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -38,6 +38,9 @@
 #include "dccp.h"
 #include "feat.h"
 
+#define CREATE_TRACE_POINTS
+#include "trace.h"
+
 DEFINE_SNMP_STAT(struct dccp_mib, dccp_statistics) __read_mostly;
 
 EXPORT_SYMBOL_GPL(dccp_statistics);
@@ -761,6 +764,8 @@ int dccp_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t len)
int rc, size;
long timeo;
 
+   trace_dccp_probe(sk, len);
+
if (len > dp->dccps_mss_cache)
return -EMSGSIZE;
 
diff --git a/net/dccp/trace.h b/net/dccp/trace.h
new file mode 100644
index ..aa01321a6c37
--- /dev/null
+++ b/net/dccp/trace.h
@@ -0,0 +1,105 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM dccp
+
+#if !defined(_TRACE_DCCP_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_DCCP_H
+
+#include 
+#include "dccp.h"
+#include "ccids/ccid3.h"
+#include 
+
+TRACE_EVENT(dccp_probe,
+
+   TP_PROTO(struct sock *sk, size_t size),
+
+   TP_ARGS(sk, size),
+
+   TP_STRUCT__entry(
+   /* sockaddr_in6 is always bigger than sockaddr_in */
+   __array(__u8, saddr, sizeof(struct sockaddr_in6))
+   __array(__u8, daddr, sizeof(struct sockaddr_in6))
+   __field(__u16, sport)
+   __field(__u16, dport)
+   __field(__u16, size)
+   __field(__u16, tx_s)
+   __field(__u32, tx_rtt)
+   __field(__u32, tx_p)
+   __field(__u32, tx_x_calc)
+   __field(__u64, tx_x_recv)
+   __field(__u64, tx_x)
+   __field(__u32, tx_t_ipi)
+   ),
+
+   TP_fast_assign(
+   const struct inet_sock *inet = inet_sk(sk);
+   struct ccid3_hc_tx_sock *hc = NULL;
+
+   if (ccid_get_current_tx_ccid(dccp_sk(sk)) == DCCPC_CCID3)
+   hc = ccid3_hc_tx_sk(sk);
+
+   memset(__entry->saddr, 0, sizeof(struct sockaddr_in6));
+   memset(__entry->daddr, 0, sizeof(struct sockaddr_in6));
+
+   if (sk->sk_family == AF_INET) {
+   struct sockaddr_in *v4 = (void *)__entry->saddr;
+
+   v4->sin_family = AF_INET;
+   v4->sin_port = inet->inet_sport;
+   v4->sin_addr.s_addr = inet->inet_saddr;
+   v4 = (void *)__entry->daddr;
+   v4->sin_family = AF_INET;
+   v4->sin_port = inet->inet_dport;
+   v4->sin_addr.s_addr = inet->inet_daddr;
+#if IS_ENABLED(CONFIG_IPV6)
+   } else if (sk->sk_family == AF_INET6) {
+   struct sockaddr_in6 *v6 = (void *)__entry->saddr;
+
+   v6->sin6_family = AF_INET6;
+   v6->sin6_port = inet->inet_sport;
+   v6->sin6_addr = inet6_sk(sk)->saddr;
+   v6 = (void *)__entry->daddr;
+   v6->sin6_family = AF_INET6;
+   v6->sin6_port = inet->inet_dport;
+   v6->sin6_addr = sk->sk_v6_daddr;
+#endif
+   }
+
+   /* For filtering use */
+   __entry->sport = ntohs(inet->inet_sport);
+   __entry->dport = ntohs(inet->inet_dport);
+
+   __entry->size = size;
+   if (hc) {
+   __entry->tx_s = hc->tx_s;
+   __entry->tx_rtt = hc->tx_rtt;
+   __entry->tx_p = hc->tx_p;
+   __entry->tx_x_calc = hc->tx_x_calc;
+   __entry->tx_x_recv = hc->tx_x_recv >> 6;
+   __entry->tx_x = hc->tx_x >> 6;
+   __entry->tx_t_ipi = hc->tx_t_ipi;
+   } else {
+   __entry->tx_s = 0;
+   memset(&__entry->tx_rtt, 0, (void *)&__entry->tx_t_ipi -
+  (void *)&__entry->tx_rtt +
+  sizeof(__entry->tx_t_ipi));
+   }
+   ),
+
+   TP_printk("src=%pISpc dest=%pISpc 

[PATCH net-next v6 4/6] net: sctp: Remove debug SCTP probe module

2017-12-27 Thread Masami Hiramatsu
Remove SCTP probe module since jprobe has been deprecated.
That function is now replaced by sctp/sctp_probe and
sctp/sctp_probe_path trace-events.
You can use it via ftrace or perftools.

Signed-off-by: Masami Hiramatsu 
---
 net/sctp/Kconfig  |   12 ---
 net/sctp/Makefile |3 -
 net/sctp/probe.c  |  244 -
 3 files changed, 259 deletions(-)
 delete mode 100644 net/sctp/probe.c

diff --git a/net/sctp/Kconfig b/net/sctp/Kconfig
index d9c04dc1b3f3..c740b189d4ba 100644
--- a/net/sctp/Kconfig
+++ b/net/sctp/Kconfig
@@ -37,18 +37,6 @@ menuconfig IP_SCTP
 
 if IP_SCTP
 
-config NET_SCTPPROBE
-   tristate "SCTP: Association probing"
-depends on PROC_FS && KPROBES
----help---
-This module allows for capturing the changes to SCTP association
-state in response to incoming packets. It is used for debugging
-SCTP congestion control algorithms. If you don't understand
-what was just said, you don't need it: say N.
-
-To compile this code as a module, choose M here: the
-module will be called sctp_probe.
-
 config SCTP_DBG_OBJCNT
bool "SCTP: Debug object counts"
depends on PROC_FS
diff --git a/net/sctp/Makefile b/net/sctp/Makefile
index 54bd9c1a8aa1..6776582ec449 100644
--- a/net/sctp/Makefile
+++ b/net/sctp/Makefile
@@ -4,7 +4,6 @@
 #
 
 obj-$(CONFIG_IP_SCTP) += sctp.o
-obj-$(CONFIG_NET_SCTPPROBE) += sctp_probe.o
 obj-$(CONFIG_INET_SCTP_DIAG) += sctp_diag.o
 
 sctp-y := sm_statetable.o sm_statefuns.o sm_sideeffect.o \
@@ -16,8 +15,6 @@ sctp-y := sm_statetable.o sm_statefuns.o sm_sideeffect.o \
  offload.o stream_sched.o stream_sched_prio.o \
  stream_sched_rr.o stream_interleave.o
 
-sctp_probe-y := probe.o
-
 sctp-$(CONFIG_SCTP_DBG_OBJCNT) += objcnt.o
 sctp-$(CONFIG_PROC_FS) += proc.o
 sctp-$(CONFIG_SYSCTL) += sysctl.o
diff --git a/net/sctp/probe.c b/net/sctp/probe.c
deleted file mode 100644
index 1280f85a598d..
--- a/net/sctp/probe.c
+++ /dev/null
@@ -1,244 +0,0 @@
-/*
- * sctp_probe - Observe the SCTP flow with kprobes.
- *
- * The idea for this came from Werner Almesberger's umlsim
- * Copyright (C) 2004, Stephen Hemminger 
- *
- * Modified for SCTP from Stephen Hemminger's code
- * Copyright (C) 2010, Wei Yongjun 
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- */
-
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include 
-#include 
-
-MODULE_SOFTDEP("pre: sctp");
-MODULE_AUTHOR("Wei Yongjun ");
-MODULE_DESCRIPTION("SCTP snooper");
-MODULE_LICENSE("GPL");
-
-static int port __read_mostly = 0;
-MODULE_PARM_DESC(port, "Port to match (0=all)");
-module_param(port, int, 0);
-
-static unsigned int fwmark __read_mostly = 0;
-MODULE_PARM_DESC(fwmark, "skb mark to match (0=no mark)");
-module_param(fwmark, uint, 0);
-
-static int bufsize __read_mostly = 64 * 1024;
-MODULE_PARM_DESC(bufsize, "Log buffer size (default 64k)");
-module_param(bufsize, int, 0);
-
-static int full __read_mostly = 1;
-MODULE_PARM_DESC(full, "Full log (1=every ack packet received,  0=only cwnd 
changes)");
-module_param(full, int, 0);
-
-static const char procname[] = "sctpprobe";
-
-static struct {
-   struct kfifo  fifo;
-   spinlock_tlock;
-   wait_queue_head_t wait;
-   struct timespec64 tstart;
-} sctpw;
-
-static __printf(1, 2) void printl(const char *fmt, ...)
-{
-   va_list args;
-   int len;
-   char tbuf[256];
-
-   va_start(args, fmt);
-   len = vscnprintf(tbuf, sizeof(tbuf), fmt, args);
-   va_end(args);
-
-   kfifo_in_locked(, tbuf, len, );
-   wake_up();
-}
-
-static int sctpprobe_open(struct inode *inode, struct file *file)
-{
-   kfifo_reset();
-   ktime_get_ts64();
-
-   return 0;
-}
-
-static ssize_t sctpprobe_read(struct file *file, char __user *buf,
- size_t len, loff_t *ppos)
-{
-   int error = 0, cnt = 0;
-   unsigned char *tbuf;
-
-   if (!buf)
-   return -EINVAL;
-
-   if (len == 0)
-   return 0;
-
-   tbuf = vmalloc(len);
-   if (!tbuf)
-   return 

[PATCH net-next v6 1/6] net: tcp: Add trace events for TCP congestion window tracing

2017-12-27 Thread Masami Hiramatsu
This adds an event to trace TCP stat variables with
slightly intrusive trace-event. This uses ftrace/perf
event log buffer to trace those state, no needs to
prepare own ring-buffer, nor custom user apps.

User can use ftrace to trace this event as below;

  # cd /sys/kernel/debug/tracing
  # echo 1 > events/tcp/tcp_probe/enable
  (run workloads)
  # cat trace

Signed-off-by: Masami Hiramatsu 
---
 Changes in v6:
  - Avoid preprocessor directives in tracepoint macro args as
Mat did on net tree.
---
 include/trace/events/tcp.h |   97 
 net/ipv4/tcp_input.c   |3 +
 2 files changed, 100 insertions(+)

diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
index 8e88a1671538..4dea6342f7d4 100644
--- a/include/trace/events/tcp.h
+++ b/include/trace/events/tcp.h
@@ -1,3 +1,4 @@
+/* SPDX-License-Identifier: GPL-2.0 */
 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM tcp
 
@@ -8,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * tcp event with arguments sk and skb
@@ -277,6 +279,101 @@ TRACE_EVENT(tcp_retransmit_synack,
  __entry->saddr_v6, __entry->daddr_v6)
 );
 
+
+#define TP_STORE_ADDR_PORTS_V4(__entry, inet, sk)  \
+   do {\
+   struct sockaddr_in *v4 = (void *)__entry->saddr;\
+   \
+   v4->sin_family = AF_INET;   \
+   v4->sin_port = inet->inet_sport;\
+   v4->sin_addr.s_addr = inet->inet_saddr; \
+   v4 = (void *)__entry->daddr;\
+   v4->sin_family = AF_INET;   \
+   v4->sin_port = inet->inet_dport;\
+   v4->sin_addr.s_addr = inet->inet_daddr; \
+   } while (0)
+
+#if IS_ENABLED(CONFIG_IPV6)
+
+#define TP_STORE_ADDR_PORTS(__entry, inet, sk) \
+   do {\
+   if (sk->sk_family == AF_INET6) {\
+   struct sockaddr_in6 *v6 = (void *)__entry->saddr; \
+   \
+   v6->sin6_family = AF_INET6; \
+   v6->sin6_port = inet->inet_sport;   \
+   v6->sin6_addr = inet6_sk(sk)->saddr;\
+   v6 = (void *)__entry->daddr;\
+   v6->sin6_family = AF_INET6; \
+   v6->sin6_port = inet->inet_dport;   \
+   v6->sin6_addr = sk->sk_v6_daddr;\
+   } else  \
+   TP_STORE_ADDR_PORTS_V4(__entry, inet, sk);  \
+   } while (0)
+
+#else
+
+#define TP_STORE_ADDR_PORTS(__entry, inet, sk) \
+   TP_STORE_ADDR_PORTS_V4(__entry, inet, sk);
+
+#endif
+
+TRACE_EVENT(tcp_probe,
+
+   TP_PROTO(struct sock *sk, struct sk_buff *skb),
+
+   TP_ARGS(sk, skb),
+
+   TP_STRUCT__entry(
+   /* sockaddr_in6 is always bigger than sockaddr_in */
+   __array(__u8, saddr, sizeof(struct sockaddr_in6))
+   __array(__u8, daddr, sizeof(struct sockaddr_in6))
+   __field(__u16, sport)
+   __field(__u16, dport)
+   __field(__u32, mark)
+   __field(__u16, length)
+   __field(__u32, snd_nxt)
+   __field(__u32, snd_una)
+   __field(__u32, snd_cwnd)
+   __field(__u32, ssthresh)
+   __field(__u32, snd_wnd)
+   __field(__u32, srtt)
+   __field(__u32, rcv_wnd)
+   ),
+
+   TP_fast_assign(
+   const struct tcp_sock *tp = tcp_sk(sk);
+   const struct inet_sock *inet = inet_sk(sk);
+
+   memset(__entry->saddr, 0, sizeof(struct sockaddr_in6));
+   memset(__entry->daddr, 0, sizeof(struct sockaddr_in6));
+
+   TP_STORE_ADDR_PORTS(__entry, inet, sk);
+
+   /* For filtering use */
+   __entry->sport = ntohs(inet->inet_sport);
+   __entry->dport = ntohs(inet->inet_dport);
+   __entry->mark = skb->mark;
+
+   __entry->length = skb->len;
+   __entry->snd_nxt = tp->snd_nxt;
+   __entry->snd_una = tp->snd_una;
+   __entry->snd_cwnd = tp->snd_cwnd;
+   __entry->snd_wnd = tp->snd_wnd;
+   __entry->rcv_wnd = tp->rcv_wnd;
+   __entry->ssthresh = tcp_current_ssthresh(sk);
+   __entry->srtt = tp->srtt_us >> 3;
+   ),
+
+   

[PATCH net-next v6 5/6] net: dccp: Add DCCP sendmsg trace event

2017-12-27 Thread Masami Hiramatsu
Add DCCP sendmsg trace event (dccp/dccp_probe) for
replacing dccpprobe. User can trace this event via
ftrace or perftools.

Signed-off-by: Masami Hiramatsu 
---
  Changes in v5
   - Fix to add local directory to include for trace.h.
 Thanks Steven!
---
 net/dccp/Makefile |3 ++
 net/dccp/proto.c  |5 +++
 net/dccp/trace.h  |  105 +
 3 files changed, 113 insertions(+)
 create mode 100644 net/dccp/trace.h

diff --git a/net/dccp/Makefile b/net/dccp/Makefile
index 2e7b56097bc4..4215f13a63af 100644
--- a/net/dccp/Makefile
+++ b/net/dccp/Makefile
@@ -27,3 +27,6 @@ dccp-$(CONFIG_SYSCTL) += sysctl.o
 
 dccp_diag-y := diag.o
 dccp_probe-y := probe.o
+
+# build with local directory for trace.h
+CFLAGS_proto.o := -I$(src)
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index 7a75a1d3568b..fa7e92e08920 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -38,6 +38,9 @@
 #include "dccp.h"
 #include "feat.h"
 
+#define CREATE_TRACE_POINTS
+#include "trace.h"
+
 DEFINE_SNMP_STAT(struct dccp_mib, dccp_statistics) __read_mostly;
 
 EXPORT_SYMBOL_GPL(dccp_statistics);
@@ -761,6 +764,8 @@ int dccp_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t len)
int rc, size;
long timeo;
 
+   trace_dccp_probe(sk, len);
+
if (len > dp->dccps_mss_cache)
return -EMSGSIZE;
 
diff --git a/net/dccp/trace.h b/net/dccp/trace.h
new file mode 100644
index ..aa01321a6c37
--- /dev/null
+++ b/net/dccp/trace.h
@@ -0,0 +1,105 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM dccp
+
+#if !defined(_TRACE_DCCP_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_DCCP_H
+
+#include 
+#include "dccp.h"
+#include "ccids/ccid3.h"
+#include 
+
+TRACE_EVENT(dccp_probe,
+
+   TP_PROTO(struct sock *sk, size_t size),
+
+   TP_ARGS(sk, size),
+
+   TP_STRUCT__entry(
+   /* sockaddr_in6 is always bigger than sockaddr_in */
+   __array(__u8, saddr, sizeof(struct sockaddr_in6))
+   __array(__u8, daddr, sizeof(struct sockaddr_in6))
+   __field(__u16, sport)
+   __field(__u16, dport)
+   __field(__u16, size)
+   __field(__u16, tx_s)
+   __field(__u32, tx_rtt)
+   __field(__u32, tx_p)
+   __field(__u32, tx_x_calc)
+   __field(__u64, tx_x_recv)
+   __field(__u64, tx_x)
+   __field(__u32, tx_t_ipi)
+   ),
+
+   TP_fast_assign(
+   const struct inet_sock *inet = inet_sk(sk);
+   struct ccid3_hc_tx_sock *hc = NULL;
+
+   if (ccid_get_current_tx_ccid(dccp_sk(sk)) == DCCPC_CCID3)
+   hc = ccid3_hc_tx_sk(sk);
+
+   memset(__entry->saddr, 0, sizeof(struct sockaddr_in6));
+   memset(__entry->daddr, 0, sizeof(struct sockaddr_in6));
+
+   if (sk->sk_family == AF_INET) {
+   struct sockaddr_in *v4 = (void *)__entry->saddr;
+
+   v4->sin_family = AF_INET;
+   v4->sin_port = inet->inet_sport;
+   v4->sin_addr.s_addr = inet->inet_saddr;
+   v4 = (void *)__entry->daddr;
+   v4->sin_family = AF_INET;
+   v4->sin_port = inet->inet_dport;
+   v4->sin_addr.s_addr = inet->inet_daddr;
+#if IS_ENABLED(CONFIG_IPV6)
+   } else if (sk->sk_family == AF_INET6) {
+   struct sockaddr_in6 *v6 = (void *)__entry->saddr;
+
+   v6->sin6_family = AF_INET6;
+   v6->sin6_port = inet->inet_sport;
+   v6->sin6_addr = inet6_sk(sk)->saddr;
+   v6 = (void *)__entry->daddr;
+   v6->sin6_family = AF_INET6;
+   v6->sin6_port = inet->inet_dport;
+   v6->sin6_addr = sk->sk_v6_daddr;
+#endif
+   }
+
+   /* For filtering use */
+   __entry->sport = ntohs(inet->inet_sport);
+   __entry->dport = ntohs(inet->inet_dport);
+
+   __entry->size = size;
+   if (hc) {
+   __entry->tx_s = hc->tx_s;
+   __entry->tx_rtt = hc->tx_rtt;
+   __entry->tx_p = hc->tx_p;
+   __entry->tx_x_calc = hc->tx_x_calc;
+   __entry->tx_x_recv = hc->tx_x_recv >> 6;
+   __entry->tx_x = hc->tx_x >> 6;
+   __entry->tx_t_ipi = hc->tx_t_ipi;
+   } else {
+   __entry->tx_s = 0;
+   memset(&__entry->tx_rtt, 0, (void *)&__entry->tx_t_ipi -
+  (void *)&__entry->tx_rtt +
+  sizeof(__entry->tx_t_ipi));
+   }
+   ),
+
+   TP_printk("src=%pISpc dest=%pISpc size=%d tx_s=%d 

[PATCH net-next v6 4/6] net: sctp: Remove debug SCTP probe module

2017-12-27 Thread Masami Hiramatsu
Remove SCTP probe module since jprobe has been deprecated.
That function is now replaced by sctp/sctp_probe and
sctp/sctp_probe_path trace-events.
You can use it via ftrace or perftools.

Signed-off-by: Masami Hiramatsu 
---
 net/sctp/Kconfig  |   12 ---
 net/sctp/Makefile |3 -
 net/sctp/probe.c  |  244 -
 3 files changed, 259 deletions(-)
 delete mode 100644 net/sctp/probe.c

diff --git a/net/sctp/Kconfig b/net/sctp/Kconfig
index d9c04dc1b3f3..c740b189d4ba 100644
--- a/net/sctp/Kconfig
+++ b/net/sctp/Kconfig
@@ -37,18 +37,6 @@ menuconfig IP_SCTP
 
 if IP_SCTP
 
-config NET_SCTPPROBE
-   tristate "SCTP: Association probing"
-depends on PROC_FS && KPROBES
----help---
-This module allows for capturing the changes to SCTP association
-state in response to incoming packets. It is used for debugging
-SCTP congestion control algorithms. If you don't understand
-what was just said, you don't need it: say N.
-
-To compile this code as a module, choose M here: the
-module will be called sctp_probe.
-
 config SCTP_DBG_OBJCNT
bool "SCTP: Debug object counts"
depends on PROC_FS
diff --git a/net/sctp/Makefile b/net/sctp/Makefile
index 54bd9c1a8aa1..6776582ec449 100644
--- a/net/sctp/Makefile
+++ b/net/sctp/Makefile
@@ -4,7 +4,6 @@
 #
 
 obj-$(CONFIG_IP_SCTP) += sctp.o
-obj-$(CONFIG_NET_SCTPPROBE) += sctp_probe.o
 obj-$(CONFIG_INET_SCTP_DIAG) += sctp_diag.o
 
 sctp-y := sm_statetable.o sm_statefuns.o sm_sideeffect.o \
@@ -16,8 +15,6 @@ sctp-y := sm_statetable.o sm_statefuns.o sm_sideeffect.o \
  offload.o stream_sched.o stream_sched_prio.o \
  stream_sched_rr.o stream_interleave.o
 
-sctp_probe-y := probe.o
-
 sctp-$(CONFIG_SCTP_DBG_OBJCNT) += objcnt.o
 sctp-$(CONFIG_PROC_FS) += proc.o
 sctp-$(CONFIG_SYSCTL) += sysctl.o
diff --git a/net/sctp/probe.c b/net/sctp/probe.c
deleted file mode 100644
index 1280f85a598d..
--- a/net/sctp/probe.c
+++ /dev/null
@@ -1,244 +0,0 @@
-/*
- * sctp_probe - Observe the SCTP flow with kprobes.
- *
- * The idea for this came from Werner Almesberger's umlsim
- * Copyright (C) 2004, Stephen Hemminger 
- *
- * Modified for SCTP from Stephen Hemminger's code
- * Copyright (C) 2010, Wei Yongjun 
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- */
-
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include 
-#include 
-
-MODULE_SOFTDEP("pre: sctp");
-MODULE_AUTHOR("Wei Yongjun ");
-MODULE_DESCRIPTION("SCTP snooper");
-MODULE_LICENSE("GPL");
-
-static int port __read_mostly = 0;
-MODULE_PARM_DESC(port, "Port to match (0=all)");
-module_param(port, int, 0);
-
-static unsigned int fwmark __read_mostly = 0;
-MODULE_PARM_DESC(fwmark, "skb mark to match (0=no mark)");
-module_param(fwmark, uint, 0);
-
-static int bufsize __read_mostly = 64 * 1024;
-MODULE_PARM_DESC(bufsize, "Log buffer size (default 64k)");
-module_param(bufsize, int, 0);
-
-static int full __read_mostly = 1;
-MODULE_PARM_DESC(full, "Full log (1=every ack packet received,  0=only cwnd 
changes)");
-module_param(full, int, 0);
-
-static const char procname[] = "sctpprobe";
-
-static struct {
-   struct kfifo  fifo;
-   spinlock_tlock;
-   wait_queue_head_t wait;
-   struct timespec64 tstart;
-} sctpw;
-
-static __printf(1, 2) void printl(const char *fmt, ...)
-{
-   va_list args;
-   int len;
-   char tbuf[256];
-
-   va_start(args, fmt);
-   len = vscnprintf(tbuf, sizeof(tbuf), fmt, args);
-   va_end(args);
-
-   kfifo_in_locked(, tbuf, len, );
-   wake_up();
-}
-
-static int sctpprobe_open(struct inode *inode, struct file *file)
-{
-   kfifo_reset();
-   ktime_get_ts64();
-
-   return 0;
-}
-
-static ssize_t sctpprobe_read(struct file *file, char __user *buf,
- size_t len, loff_t *ppos)
-{
-   int error = 0, cnt = 0;
-   unsigned char *tbuf;
-
-   if (!buf)
-   return -EINVAL;
-
-   if (len == 0)
-   return 0;
-
-   tbuf = vmalloc(len);
-   if (!tbuf)
-   return -ENOMEM;
-
-   error = wait_event_interruptible(sctpw.wait,
- 

[PATCH net-next v6 1/6] net: tcp: Add trace events for TCP congestion window tracing

2017-12-27 Thread Masami Hiramatsu
This adds an event to trace TCP stat variables with
slightly intrusive trace-event. This uses ftrace/perf
event log buffer to trace those state, no needs to
prepare own ring-buffer, nor custom user apps.

User can use ftrace to trace this event as below;

  # cd /sys/kernel/debug/tracing
  # echo 1 > events/tcp/tcp_probe/enable
  (run workloads)
  # cat trace

Signed-off-by: Masami Hiramatsu 
---
 Changes in v6:
  - Avoid preprocessor directives in tracepoint macro args as
Mat did on net tree.
---
 include/trace/events/tcp.h |   97 
 net/ipv4/tcp_input.c   |3 +
 2 files changed, 100 insertions(+)

diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
index 8e88a1671538..4dea6342f7d4 100644
--- a/include/trace/events/tcp.h
+++ b/include/trace/events/tcp.h
@@ -1,3 +1,4 @@
+/* SPDX-License-Identifier: GPL-2.0 */
 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM tcp
 
@@ -8,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * tcp event with arguments sk and skb
@@ -277,6 +279,101 @@ TRACE_EVENT(tcp_retransmit_synack,
  __entry->saddr_v6, __entry->daddr_v6)
 );
 
+
+#define TP_STORE_ADDR_PORTS_V4(__entry, inet, sk)  \
+   do {\
+   struct sockaddr_in *v4 = (void *)__entry->saddr;\
+   \
+   v4->sin_family = AF_INET;   \
+   v4->sin_port = inet->inet_sport;\
+   v4->sin_addr.s_addr = inet->inet_saddr; \
+   v4 = (void *)__entry->daddr;\
+   v4->sin_family = AF_INET;   \
+   v4->sin_port = inet->inet_dport;\
+   v4->sin_addr.s_addr = inet->inet_daddr; \
+   } while (0)
+
+#if IS_ENABLED(CONFIG_IPV6)
+
+#define TP_STORE_ADDR_PORTS(__entry, inet, sk) \
+   do {\
+   if (sk->sk_family == AF_INET6) {\
+   struct sockaddr_in6 *v6 = (void *)__entry->saddr; \
+   \
+   v6->sin6_family = AF_INET6; \
+   v6->sin6_port = inet->inet_sport;   \
+   v6->sin6_addr = inet6_sk(sk)->saddr;\
+   v6 = (void *)__entry->daddr;\
+   v6->sin6_family = AF_INET6; \
+   v6->sin6_port = inet->inet_dport;   \
+   v6->sin6_addr = sk->sk_v6_daddr;\
+   } else  \
+   TP_STORE_ADDR_PORTS_V4(__entry, inet, sk);  \
+   } while (0)
+
+#else
+
+#define TP_STORE_ADDR_PORTS(__entry, inet, sk) \
+   TP_STORE_ADDR_PORTS_V4(__entry, inet, sk);
+
+#endif
+
+TRACE_EVENT(tcp_probe,
+
+   TP_PROTO(struct sock *sk, struct sk_buff *skb),
+
+   TP_ARGS(sk, skb),
+
+   TP_STRUCT__entry(
+   /* sockaddr_in6 is always bigger than sockaddr_in */
+   __array(__u8, saddr, sizeof(struct sockaddr_in6))
+   __array(__u8, daddr, sizeof(struct sockaddr_in6))
+   __field(__u16, sport)
+   __field(__u16, dport)
+   __field(__u32, mark)
+   __field(__u16, length)
+   __field(__u32, snd_nxt)
+   __field(__u32, snd_una)
+   __field(__u32, snd_cwnd)
+   __field(__u32, ssthresh)
+   __field(__u32, snd_wnd)
+   __field(__u32, srtt)
+   __field(__u32, rcv_wnd)
+   ),
+
+   TP_fast_assign(
+   const struct tcp_sock *tp = tcp_sk(sk);
+   const struct inet_sock *inet = inet_sk(sk);
+
+   memset(__entry->saddr, 0, sizeof(struct sockaddr_in6));
+   memset(__entry->daddr, 0, sizeof(struct sockaddr_in6));
+
+   TP_STORE_ADDR_PORTS(__entry, inet, sk);
+
+   /* For filtering use */
+   __entry->sport = ntohs(inet->inet_sport);
+   __entry->dport = ntohs(inet->inet_dport);
+   __entry->mark = skb->mark;
+
+   __entry->length = skb->len;
+   __entry->snd_nxt = tp->snd_nxt;
+   __entry->snd_una = tp->snd_una;
+   __entry->snd_cwnd = tp->snd_cwnd;
+   __entry->snd_wnd = tp->snd_wnd;
+   __entry->rcv_wnd = tp->rcv_wnd;
+   __entry->ssthresh = tcp_current_ssthresh(sk);
+   __entry->srtt = tp->srtt_us >> 3;
+   ),
+
+   TP_printk("src=%pISpc 

Re: [Ocfs2-devel] [PATCH] ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGE

2017-12-27 Thread alex chen


On 2017/12/28 10:48, Gang He wrote:
> Hi Alex,
> 
> 

>> Hi Gang,
>>
>> On 2017/12/27 18:37, Gang He wrote:
>>> Hi Jun,
>>>
>>>
>>
 Hi Gang,

 Do you mean that too many retrys in loop cast losts of CPU-time and
 block page-fault interrupt? We should not add any delay in
 ocfs2_fault(), right? And I still feel a little confused why your
 method can solve this problem.
>>> You can see the related code in function filemap_fault(), if ocfs2 fails to 
>> read a page since 
>>> it can not get a inode lock with non-block mode, the VFS layer code will 
>> invoke ocfs2
>>> read page call back function circularly, this will lead to a softlockup 
>> problem (like the below back trace). 
>>> So, we should get a blocking lock to let the dlm lock to this node and also 
>> can avoid CPU loop,
>> Can we use 'cond_resched()' to allow the thread to release the CPU 
>> temperately for solving this softlockup?
> Yes, we can use cond_resched() function to avoid this softlockup.
> In fact, if the kernel is configured with CONFIG_PREEMPT=y, this softlockup 
> does not happen since the kernel can help.
> But, this way still leads to CPU resource waste, CPU usage can reach about 
> 80% - 100% when 
> multiple nodes read/write/mmap-access the same file concurrently, and more, 
> the read/write/mmap-access
> speed is more lower (50% decrease). 
> Why? 
> Because we need to get DLM lock for each node, before one node gets DLM lock, 
> another node has 
> to down-convert this DLM lock, that means flushing the memory data to the 
> disk before DLM lock down-conversion.
> this disk IO operation is very slow compared with CPU cycle, that means the 
> node which want to get DLM lock,
> will do lots of reties before another node complete down-converting this DLM 
> lock, actual, these retries do not make
> sense, just waste CPU cycle. 
> So, if we add a blocking lock/unlock here, we will avoid these unnecessary 
> reties, especially in case slow-speed disk and more ocfs2 nodes(>=3).
> I did the ocfs2 test case (multi_mmap in multiple_run.sh), after applied this 
> patch, the CPU rate on each node was about 40%-50%, and the test case  
> execution time reduced by half.
> the full command is as below,
> multiple_run.sh -i eth0 -k ~/linux-4.4.21-69.tar.gz -o ~/ocfs2mullog -C 
> hacluster -s pcmk -n nd1,nd2,nd3 -d /dev/sda1 -b 4096 -c 32768 -t multi_mmap 
> /mnt/shared
> the shared storage is a iscsi disk.
> 
OK, I think it is more better if you can add you test method and result in 
change log.

Thanks,
Alex
> Thanks
> Gang
> 
>>
>>> second, base on my testing, the patch also can improve the efficiency in 
>> case modifying the same
>>> file frequently from multiple nodes, since the lock acquisition chance is 
>> more fair.
>>> In fact, the code was modified by a patch 1cce4df04f37 ("ocfs2: do not 
>> lock/unlock() inode DLM lock"),
>>> before that patch, the code is the same, this patch can be considered to 
>> revert that patch, except adding more
>>> clear comments.
>> In patch 1cce4df04f37 ("ocfs2: do not lock/unlock() inode DLM lock"), 
>> Goldwyn says blocking lock and unlock will only make
>> the performance worse where contention over the locks is high, which is the 
>> opposite of your described above.
>> IMO, blocking lock and unlock here is indeed unnecessary.
>>
>> Thanks,
>> Alex
>>>  
>>> Thanks
>>> Gang
>>>
>>>

 thanks,
 Jun

 On 2017/12/27 17:29, Gang He wrote:
> If we can't get inode lock immediately in the function
> ocfs2_inode_lock_with_page() when reading a page, we should not
> return directly here, since this will lead to a softlockup problem.
> The method is to get a blocking lock and immediately unlock before
> returning, this can avoid CPU resource waste due to lots of retries,
> and benefits fairness in getting lock among multiple nodes, increase
> efficiency in case modifying the same file frequently from multiple
> nodes.
> The softlockup problem looks like,
> Kernel panic - not syncing: softlockup: hung tasks
> CPU: 0 PID: 885 Comm: multi_mmap Tainted: G L 4.12.14-6.1-default #1
> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> Call Trace:
>   
>   dump_stack+0x5c/0x82
>   panic+0xd5/0x21e
>   watchdog_timer_fn+0x208/0x210
>   ? watchdog_park_threads+0x70/0x70
>   __hrtimer_run_queues+0xcc/0x200
>   hrtimer_interrupt+0xa6/0x1f0
>   smp_apic_timer_interrupt+0x34/0x50
>   apic_timer_interrupt+0x96/0xa0
>   
>  RIP: 0010:unlock_page+0x17/0x30
>  RSP: :af154080bc88 EFLAGS: 0246 ORIG_RAX: ff10
>  RAX: dead0100 RBX: f21e009f5300 RCX: 0004
>  RDX: dead00ff RSI: 0202 RDI: f21e009f5300
>  RBP:  R08:  R09: af154080bb00
>  R10: af154080bc30 R11: 0040 R12: 993749a39518
>  R13:  R14: f21e009f5300 R15: 

[PATCH net-next v6 2/6] net: tcp: Remove TCP probe module

2017-12-27 Thread Masami Hiramatsu
Remove TCP probe module since jprobe has been deprecated.
That function is now replaced by tcp/tcp_probe trace-event.
You can use it via ftrace or perftools.

Signed-off-by: Masami Hiramatsu 
---
 net/Kconfig  |   17 ---
 net/ipv4/Makefile|1 
 net/ipv4/tcp_probe.c |  301 --
 3 files changed, 319 deletions(-)
 delete mode 100644 net/ipv4/tcp_probe.c

diff --git a/net/Kconfig b/net/Kconfig
index 9dba2715919d..efe930db3c08 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -336,23 +336,6 @@ config NET_PKTGEN
  To compile this code as a module, choose M here: the
  module will be called pktgen.
 
-config NET_TCPPROBE
-   tristate "TCP connection probing"
-   depends on INET && PROC_FS && KPROBES
-   ---help---
-   This module allows for capturing the changes to TCP connection
-   state in response to incoming packets. It is used for debugging
-   TCP congestion avoidance modules. If you don't understand
-   what was just said, you don't need it: say N.
-
-   Documentation on how to use TCP connection probing can be found
-   at:
-   
- 
http://www.linuxfoundation.org/collaborate/workgroups/networking/tcpprobe
-
-   To compile this code as a module, choose M here: the
-   module will be called tcp_probe.
-
 config NET_DROP_MONITOR
tristate "Network packet drop alerting service"
depends on INET && TRACEPOINTS
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index c6c8ad1d4b6d..47a0a6649a9d 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -43,7 +43,6 @@ obj-$(CONFIG_INET_DIAG) += inet_diag.o
 obj-$(CONFIG_INET_TCP_DIAG) += tcp_diag.o
 obj-$(CONFIG_INET_UDP_DIAG) += udp_diag.o
 obj-$(CONFIG_INET_RAW_DIAG) += raw_diag.o
-obj-$(CONFIG_NET_TCPPROBE) += tcp_probe.o
 obj-$(CONFIG_TCP_CONG_BBR) += tcp_bbr.o
 obj-$(CONFIG_TCP_CONG_BIC) += tcp_bic.o
 obj-$(CONFIG_TCP_CONG_CDG) += tcp_cdg.o
diff --git a/net/ipv4/tcp_probe.c b/net/ipv4/tcp_probe.c
deleted file mode 100644
index 697f4c67b2e3..
--- a/net/ipv4/tcp_probe.c
+++ /dev/null
@@ -1,301 +0,0 @@
-/*
- * tcpprobe - Observe the TCP flow with kprobes.
- *
- * The idea for this came from Werner Almesberger's umlsim
- * Copyright (C) 2004, Stephen Hemminger 
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- */
-
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include 
-
-MODULE_AUTHOR("Stephen Hemminger ");
-MODULE_DESCRIPTION("TCP cwnd snooper");
-MODULE_LICENSE("GPL");
-MODULE_VERSION("1.1");
-
-static int port __read_mostly;
-MODULE_PARM_DESC(port, "Port to match (0=all)");
-module_param(port, int, 0);
-
-static unsigned int bufsize __read_mostly = 4096;
-MODULE_PARM_DESC(bufsize, "Log buffer size in packets (4096)");
-module_param(bufsize, uint, 0);
-
-static unsigned int fwmark __read_mostly;
-MODULE_PARM_DESC(fwmark, "skb mark to match (0=no mark)");
-module_param(fwmark, uint, 0);
-
-static int full __read_mostly;
-MODULE_PARM_DESC(full, "Full log (1=every ack packet received,  0=only cwnd 
changes)");
-module_param(full, int, 0);
-
-static const char procname[] = "tcpprobe";
-
-struct tcp_log {
-   ktime_t tstamp;
-   union {
-   struct sockaddr raw;
-   struct sockaddr_in  v4;
-   struct sockaddr_in6 v6;
-   }   src, dst;
-   u16 length;
-   u32 snd_nxt;
-   u32 snd_una;
-   u32 snd_wnd;
-   u32 rcv_wnd;
-   u32 snd_cwnd;
-   u32 ssthresh;
-   u32 srtt;
-};
-
-static struct {
-   spinlock_t  lock;
-   wait_queue_head_t wait;
-   ktime_t start;
-   u32 lastcwnd;
-
-   unsigned long   head, tail;
-   struct tcp_log  *log;
-} tcp_probe;
-
-static inline int tcp_probe_used(void)
-{
-   return (tcp_probe.head - tcp_probe.tail) & (bufsize - 1);
-}
-
-static inline int tcp_probe_avail(void)
-{
-   return bufsize - tcp_probe_used() - 1;
-}
-
-#define tcp_probe_copy_fl_to_si4(inet, si4, mem)   \
-   do {\
-   si4.sin_family = AF_INET;   

[PATCH net-next v6 2/6] net: tcp: Remove TCP probe module

2017-12-27 Thread Masami Hiramatsu
Remove TCP probe module since jprobe has been deprecated.
That function is now replaced by tcp/tcp_probe trace-event.
You can use it via ftrace or perftools.

Signed-off-by: Masami Hiramatsu 
---
 net/Kconfig  |   17 ---
 net/ipv4/Makefile|1 
 net/ipv4/tcp_probe.c |  301 --
 3 files changed, 319 deletions(-)
 delete mode 100644 net/ipv4/tcp_probe.c

diff --git a/net/Kconfig b/net/Kconfig
index 9dba2715919d..efe930db3c08 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -336,23 +336,6 @@ config NET_PKTGEN
  To compile this code as a module, choose M here: the
  module will be called pktgen.
 
-config NET_TCPPROBE
-   tristate "TCP connection probing"
-   depends on INET && PROC_FS && KPROBES
-   ---help---
-   This module allows for capturing the changes to TCP connection
-   state in response to incoming packets. It is used for debugging
-   TCP congestion avoidance modules. If you don't understand
-   what was just said, you don't need it: say N.
-
-   Documentation on how to use TCP connection probing can be found
-   at:
-   
- 
http://www.linuxfoundation.org/collaborate/workgroups/networking/tcpprobe
-
-   To compile this code as a module, choose M here: the
-   module will be called tcp_probe.
-
 config NET_DROP_MONITOR
tristate "Network packet drop alerting service"
depends on INET && TRACEPOINTS
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index c6c8ad1d4b6d..47a0a6649a9d 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -43,7 +43,6 @@ obj-$(CONFIG_INET_DIAG) += inet_diag.o
 obj-$(CONFIG_INET_TCP_DIAG) += tcp_diag.o
 obj-$(CONFIG_INET_UDP_DIAG) += udp_diag.o
 obj-$(CONFIG_INET_RAW_DIAG) += raw_diag.o
-obj-$(CONFIG_NET_TCPPROBE) += tcp_probe.o
 obj-$(CONFIG_TCP_CONG_BBR) += tcp_bbr.o
 obj-$(CONFIG_TCP_CONG_BIC) += tcp_bic.o
 obj-$(CONFIG_TCP_CONG_CDG) += tcp_cdg.o
diff --git a/net/ipv4/tcp_probe.c b/net/ipv4/tcp_probe.c
deleted file mode 100644
index 697f4c67b2e3..
--- a/net/ipv4/tcp_probe.c
+++ /dev/null
@@ -1,301 +0,0 @@
-/*
- * tcpprobe - Observe the TCP flow with kprobes.
- *
- * The idea for this came from Werner Almesberger's umlsim
- * Copyright (C) 2004, Stephen Hemminger 
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- */
-
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include 
-
-MODULE_AUTHOR("Stephen Hemminger ");
-MODULE_DESCRIPTION("TCP cwnd snooper");
-MODULE_LICENSE("GPL");
-MODULE_VERSION("1.1");
-
-static int port __read_mostly;
-MODULE_PARM_DESC(port, "Port to match (0=all)");
-module_param(port, int, 0);
-
-static unsigned int bufsize __read_mostly = 4096;
-MODULE_PARM_DESC(bufsize, "Log buffer size in packets (4096)");
-module_param(bufsize, uint, 0);
-
-static unsigned int fwmark __read_mostly;
-MODULE_PARM_DESC(fwmark, "skb mark to match (0=no mark)");
-module_param(fwmark, uint, 0);
-
-static int full __read_mostly;
-MODULE_PARM_DESC(full, "Full log (1=every ack packet received,  0=only cwnd 
changes)");
-module_param(full, int, 0);
-
-static const char procname[] = "tcpprobe";
-
-struct tcp_log {
-   ktime_t tstamp;
-   union {
-   struct sockaddr raw;
-   struct sockaddr_in  v4;
-   struct sockaddr_in6 v6;
-   }   src, dst;
-   u16 length;
-   u32 snd_nxt;
-   u32 snd_una;
-   u32 snd_wnd;
-   u32 rcv_wnd;
-   u32 snd_cwnd;
-   u32 ssthresh;
-   u32 srtt;
-};
-
-static struct {
-   spinlock_t  lock;
-   wait_queue_head_t wait;
-   ktime_t start;
-   u32 lastcwnd;
-
-   unsigned long   head, tail;
-   struct tcp_log  *log;
-} tcp_probe;
-
-static inline int tcp_probe_used(void)
-{
-   return (tcp_probe.head - tcp_probe.tail) & (bufsize - 1);
-}
-
-static inline int tcp_probe_avail(void)
-{
-   return bufsize - tcp_probe_used() - 1;
-}
-
-#define tcp_probe_copy_fl_to_si4(inet, si4, mem)   \
-   do {\
-   si4.sin_family = AF_INET;   \
-   si4.sin_port = inet->inet_##mem##port;  

Re: [Ocfs2-devel] [PATCH] ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGE

2017-12-27 Thread alex chen


On 2017/12/28 10:48, Gang He wrote:
> Hi Alex,
> 
> 

>> Hi Gang,
>>
>> On 2017/12/27 18:37, Gang He wrote:
>>> Hi Jun,
>>>
>>>
>>
 Hi Gang,

 Do you mean that too many retrys in loop cast losts of CPU-time and
 block page-fault interrupt? We should not add any delay in
 ocfs2_fault(), right? And I still feel a little confused why your
 method can solve this problem.
>>> You can see the related code in function filemap_fault(), if ocfs2 fails to 
>> read a page since 
>>> it can not get a inode lock with non-block mode, the VFS layer code will 
>> invoke ocfs2
>>> read page call back function circularly, this will lead to a softlockup 
>> problem (like the below back trace). 
>>> So, we should get a blocking lock to let the dlm lock to this node and also 
>> can avoid CPU loop,
>> Can we use 'cond_resched()' to allow the thread to release the CPU 
>> temperately for solving this softlockup?
> Yes, we can use cond_resched() function to avoid this softlockup.
> In fact, if the kernel is configured with CONFIG_PREEMPT=y, this softlockup 
> does not happen since the kernel can help.
> But, this way still leads to CPU resource waste, CPU usage can reach about 
> 80% - 100% when 
> multiple nodes read/write/mmap-access the same file concurrently, and more, 
> the read/write/mmap-access
> speed is more lower (50% decrease). 
> Why? 
> Because we need to get DLM lock for each node, before one node gets DLM lock, 
> another node has 
> to down-convert this DLM lock, that means flushing the memory data to the 
> disk before DLM lock down-conversion.
> this disk IO operation is very slow compared with CPU cycle, that means the 
> node which want to get DLM lock,
> will do lots of reties before another node complete down-converting this DLM 
> lock, actual, these retries do not make
> sense, just waste CPU cycle. 
> So, if we add a blocking lock/unlock here, we will avoid these unnecessary 
> reties, especially in case slow-speed disk and more ocfs2 nodes(>=3).
> I did the ocfs2 test case (multi_mmap in multiple_run.sh), after applied this 
> patch, the CPU rate on each node was about 40%-50%, and the test case  
> execution time reduced by half.
> the full command is as below,
> multiple_run.sh -i eth0 -k ~/linux-4.4.21-69.tar.gz -o ~/ocfs2mullog -C 
> hacluster -s pcmk -n nd1,nd2,nd3 -d /dev/sda1 -b 4096 -c 32768 -t multi_mmap 
> /mnt/shared
> the shared storage is a iscsi disk.
> 
OK, I think it is more better if you can add you test method and result in 
change log.

Thanks,
Alex
> Thanks
> Gang
> 
>>
>>> second, base on my testing, the patch also can improve the efficiency in 
>> case modifying the same
>>> file frequently from multiple nodes, since the lock acquisition chance is 
>> more fair.
>>> In fact, the code was modified by a patch 1cce4df04f37 ("ocfs2: do not 
>> lock/unlock() inode DLM lock"),
>>> before that patch, the code is the same, this patch can be considered to 
>> revert that patch, except adding more
>>> clear comments.
>> In patch 1cce4df04f37 ("ocfs2: do not lock/unlock() inode DLM lock"), 
>> Goldwyn says blocking lock and unlock will only make
>> the performance worse where contention over the locks is high, which is the 
>> opposite of your described above.
>> IMO, blocking lock and unlock here is indeed unnecessary.
>>
>> Thanks,
>> Alex
>>>  
>>> Thanks
>>> Gang
>>>
>>>

 thanks,
 Jun

 On 2017/12/27 17:29, Gang He wrote:
> If we can't get inode lock immediately in the function
> ocfs2_inode_lock_with_page() when reading a page, we should not
> return directly here, since this will lead to a softlockup problem.
> The method is to get a blocking lock and immediately unlock before
> returning, this can avoid CPU resource waste due to lots of retries,
> and benefits fairness in getting lock among multiple nodes, increase
> efficiency in case modifying the same file frequently from multiple
> nodes.
> The softlockup problem looks like,
> Kernel panic - not syncing: softlockup: hung tasks
> CPU: 0 PID: 885 Comm: multi_mmap Tainted: G L 4.12.14-6.1-default #1
> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> Call Trace:
>   
>   dump_stack+0x5c/0x82
>   panic+0xd5/0x21e
>   watchdog_timer_fn+0x208/0x210
>   ? watchdog_park_threads+0x70/0x70
>   __hrtimer_run_queues+0xcc/0x200
>   hrtimer_interrupt+0xa6/0x1f0
>   smp_apic_timer_interrupt+0x34/0x50
>   apic_timer_interrupt+0x96/0xa0
>   
>  RIP: 0010:unlock_page+0x17/0x30
>  RSP: :af154080bc88 EFLAGS: 0246 ORIG_RAX: ff10
>  RAX: dead0100 RBX: f21e009f5300 RCX: 0004
>  RDX: dead00ff RSI: 0202 RDI: f21e009f5300
>  RBP:  R08:  R09: af154080bb00
>  R10: af154080bc30 R11: 0040 R12: 993749a39518
>  R13:  R14: f21e009f5300 R15: 

[PATCH net-next v6 6/6] net: dccp: Remove dccpprobe module

2017-12-27 Thread Masami Hiramatsu
Remove DCCP probe module since jprobe has been deprecated.
That function is now replaced by dccp/dccp_probe trace-event.
You can use it via ftrace or perftools.

Signed-off-by: Masami Hiramatsu 
---
 Changes in v5:
  - Fix a conflict with previous change in Makefile.
---
 net/dccp/Kconfig  |   17 
 net/dccp/Makefile |2 -
 net/dccp/probe.c  |  203 -
 3 files changed, 222 deletions(-)
 delete mode 100644 net/dccp/probe.c

diff --git a/net/dccp/Kconfig b/net/dccp/Kconfig
index 8c0ef71bed2f..b270e84d9c13 100644
--- a/net/dccp/Kconfig
+++ b/net/dccp/Kconfig
@@ -39,23 +39,6 @@ config IP_DCCP_DEBUG
 
  Just say N.
 
-config NET_DCCPPROBE
-   tristate "DCCP connection probing"
-   depends on PROC_FS && KPROBES
-   ---help---
-   This module allows for capturing the changes to DCCP connection
-   state in response to incoming packets. It is used for debugging
-   DCCP congestion avoidance modules. If you don't understand
-   what was just said, you don't need it: say N.
-
-   Documentation on how to use DCCP connection probing can be found
-   at:
-   
- 
http://www.linuxfoundation.org/collaborate/workgroups/networking/dccpprobe
-
-   To compile this code as a module, choose M here: the
-   module will be called dccp_probe.
-
 
 endmenu
 
diff --git a/net/dccp/Makefile b/net/dccp/Makefile
index 4215f13a63af..5b4ff37bc806 100644
--- a/net/dccp/Makefile
+++ b/net/dccp/Makefile
@@ -21,12 +21,10 @@ obj-$(subst y,$(CONFIG_IP_DCCP),$(CONFIG_IPV6)) += 
dccp_ipv6.o
 dccp_ipv6-y := ipv6.o
 
 obj-$(CONFIG_INET_DCCP_DIAG) += dccp_diag.o
-obj-$(CONFIG_NET_DCCPPROBE) += dccp_probe.o
 
 dccp-$(CONFIG_SYSCTL) += sysctl.o
 
 dccp_diag-y := diag.o
-dccp_probe-y := probe.o
 
 # build with local directory for trace.h
 CFLAGS_proto.o := -I$(src)
diff --git a/net/dccp/probe.c b/net/dccp/probe.c
deleted file mode 100644
index 3d3fda05b32d..
--- a/net/dccp/probe.c
+++ /dev/null
@@ -1,203 +0,0 @@
-/*
- * dccp_probe - Observe the DCCP flow with kprobes.
- *
- * The idea for this came from Werner Almesberger's umlsim
- * Copyright (C) 2004, Stephen Hemminger 
- *
- * Modified for DCCP from Stephen Hemminger's code
- * Copyright (C) 2006, Ian McDonald 
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include "dccp.h"
-#include "ccid.h"
-#include "ccids/ccid3.h"
-
-static int port;
-
-static int bufsize = 64 * 1024;
-
-static const char procname[] = "dccpprobe";
-
-static struct {
-   struct kfifo  fifo;
-   spinlock_tlock;
-   wait_queue_head_t wait;
-   struct timespec64 tstart;
-} dccpw;
-
-static void printl(const char *fmt, ...)
-{
-   va_list args;
-   int len;
-   struct timespec64 now;
-   char tbuf[256];
-
-   va_start(args, fmt);
-   getnstimeofday64();
-
-   now = timespec64_sub(now, dccpw.tstart);
-
-   len = sprintf(tbuf, "%lu.%06lu ",
- (unsigned long) now.tv_sec,
- (unsigned long) now.tv_nsec / NSEC_PER_USEC);
-   len += vscnprintf(tbuf+len, sizeof(tbuf)-len, fmt, args);
-   va_end(args);
-
-   kfifo_in_locked(, tbuf, len, );
-   wake_up();
-}
-
-static int jdccp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
-{
-   const struct inet_sock *inet = inet_sk(sk);
-   struct ccid3_hc_tx_sock *hc = NULL;
-
-   if (ccid_get_current_tx_ccid(dccp_sk(sk)) == DCCPC_CCID3)
-   hc = ccid3_hc_tx_sk(sk);
-
-   if (port == 0 || ntohs(inet->inet_dport) == port ||
-   ntohs(inet->inet_sport) == port) {
-   if (hc)
-   printl("%pI4:%u %pI4:%u %d %d %d %d %u %llu %llu %d\n",
-  >inet_saddr, ntohs(inet->inet_sport),
-  >inet_daddr, ntohs(inet->inet_dport), size,
-  hc->tx_s, hc->tx_rtt, hc->tx_p,
-  hc->tx_x_calc, hc->tx_x_recv >> 6,
-  hc->tx_x >> 6, hc->tx_t_ipi);
-   else
-   printl("%pI4:%u %pI4:%u 

[PATCH net-next v6 3/6] net: sctp: Add SCTP ACK tracking trace event

2017-12-27 Thread Masami Hiramatsu
Add SCTP ACK tracking trace event to trace the changes of SCTP
association state in response to incoming packets.
It is used for debugging SCTP congestion control algorithms,
and will replace sctp_probe module.

Note that this event a bit tricky. Since this consists of 2
events (sctp_probe and sctp_probe_path) so you have to enable
both events as below.

  # cd /sys/kernel/debug/tracing
  # echo 1 > events/sctp/sctp_probe/enable
  # echo 1 > events/sctp/sctp_probe_path/enable

Or, you can enable all the events under sctp.

  # echo 1 > events/sctp/enable

Since sctp_probe_path event is always invoked from sctp_probe
event, you can not see any output if you only enable
sctp_probe_path.

Signed-off-by: Masami Hiramatsu 
---
  Changes in v3:
   - Add checking whether sctp_probe_path event is enabled
 before iterating sctp paths to record. Thanks Steven.
  Changes in v4:
   - Move a temporal variable definition in the block.
   - Fix to cast pointer to unsigned long instead of __u64
 for 32bit environment.
---
 include/trace/events/sctp.h |   99 +++
 net/sctp/sm_statefuns.c |5 ++
 2 files changed, 104 insertions(+)
 create mode 100644 include/trace/events/sctp.h

diff --git a/include/trace/events/sctp.h b/include/trace/events/sctp.h
new file mode 100644
index ..7475c7be165a
--- /dev/null
+++ b/include/trace/events/sctp.h
@@ -0,0 +1,99 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM sctp
+
+#if !defined(_TRACE_SCTP_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_SCTP_H
+
+#include 
+#include 
+
+TRACE_EVENT(sctp_probe_path,
+
+   TP_PROTO(struct sctp_transport *sp,
+const struct sctp_association *asoc),
+
+   TP_ARGS(sp, asoc),
+
+   TP_STRUCT__entry(
+   __field(__u64, asoc)
+   __field(__u32, primary)
+   __array(__u8, ipaddr, sizeof(union sctp_addr))
+   __field(__u32, state)
+   __field(__u32, cwnd)
+   __field(__u32, ssthresh)
+   __field(__u32, flight_size)
+   __field(__u32, partial_bytes_acked)
+   __field(__u32, pathmtu)
+   ),
+
+   TP_fast_assign(
+   __entry->asoc = (unsigned long)asoc;
+   __entry->primary = (sp == asoc->peer.primary_path);
+   memcpy(__entry->ipaddr, >ipaddr, sizeof(union sctp_addr));
+   __entry->state = sp->state;
+   __entry->cwnd = sp->cwnd;
+   __entry->ssthresh = sp->ssthresh;
+   __entry->flight_size = sp->flight_size;
+   __entry->partial_bytes_acked = sp->partial_bytes_acked;
+   __entry->pathmtu = sp->pathmtu;
+   ),
+
+   TP_printk("asoc=%#llx%s ipaddr=%pISpc state=%u cwnd=%u ssthresh=%u "
+ "flight_size=%u partial_bytes_acked=%u pathmtu=%u",
+ __entry->asoc, __entry->primary ? "(*)" : "",
+ __entry->ipaddr, __entry->state, __entry->cwnd,
+ __entry->ssthresh, __entry->flight_size,
+ __entry->partial_bytes_acked, __entry->pathmtu)
+);
+
+TRACE_EVENT(sctp_probe,
+
+   TP_PROTO(const struct sctp_endpoint *ep,
+const struct sctp_association *asoc,
+struct sctp_chunk *chunk),
+
+   TP_ARGS(ep, asoc, chunk),
+
+   TP_STRUCT__entry(
+   __field(__u64, asoc)
+   __field(__u32, mark)
+   __field(__u16, bind_port)
+   __field(__u16, peer_port)
+   __field(__u32, pathmtu)
+   __field(__u32, rwnd)
+   __field(__u16, unack_data)
+   ),
+
+   TP_fast_assign(
+   struct sk_buff *skb = chunk->skb;
+
+   __entry->asoc = (unsigned long)asoc;
+   __entry->mark = skb->mark;
+   __entry->bind_port = ep->base.bind_addr.port;
+   __entry->peer_port = asoc->peer.port;
+   __entry->pathmtu = asoc->pathmtu;
+   __entry->rwnd = asoc->peer.rwnd;
+   __entry->unack_data = asoc->unack_data;
+
+   if (trace_sctp_probe_path_enabled()) {
+   struct sctp_transport *sp;
+
+   list_for_each_entry(sp, >peer.transport_addr_list,
+   transports) {
+   trace_sctp_probe_path(sp, asoc);
+   }
+   }
+   ),
+
+   TP_printk("asoc=%#llx mark=%#x bind_port=%d peer_port=%d pathmtu=%d "
+ "rwnd=%u unack_data=%d",
+ __entry->asoc, __entry->mark, __entry->bind_port,
+ __entry->peer_port, __entry->pathmtu, __entry->rwnd,
+ __entry->unack_data)
+);
+
+#endif /* _TRACE_SCTP_H */
+
+/* This part must be outside protection */
+#include 
diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index 

[PATCH net-next v6 6/6] net: dccp: Remove dccpprobe module

2017-12-27 Thread Masami Hiramatsu
Remove DCCP probe module since jprobe has been deprecated.
That function is now replaced by dccp/dccp_probe trace-event.
You can use it via ftrace or perftools.

Signed-off-by: Masami Hiramatsu 
---
 Changes in v5:
  - Fix a conflict with previous change in Makefile.
---
 net/dccp/Kconfig  |   17 
 net/dccp/Makefile |2 -
 net/dccp/probe.c  |  203 -
 3 files changed, 222 deletions(-)
 delete mode 100644 net/dccp/probe.c

diff --git a/net/dccp/Kconfig b/net/dccp/Kconfig
index 8c0ef71bed2f..b270e84d9c13 100644
--- a/net/dccp/Kconfig
+++ b/net/dccp/Kconfig
@@ -39,23 +39,6 @@ config IP_DCCP_DEBUG
 
  Just say N.
 
-config NET_DCCPPROBE
-   tristate "DCCP connection probing"
-   depends on PROC_FS && KPROBES
-   ---help---
-   This module allows for capturing the changes to DCCP connection
-   state in response to incoming packets. It is used for debugging
-   DCCP congestion avoidance modules. If you don't understand
-   what was just said, you don't need it: say N.
-
-   Documentation on how to use DCCP connection probing can be found
-   at:
-   
- 
http://www.linuxfoundation.org/collaborate/workgroups/networking/dccpprobe
-
-   To compile this code as a module, choose M here: the
-   module will be called dccp_probe.
-
 
 endmenu
 
diff --git a/net/dccp/Makefile b/net/dccp/Makefile
index 4215f13a63af..5b4ff37bc806 100644
--- a/net/dccp/Makefile
+++ b/net/dccp/Makefile
@@ -21,12 +21,10 @@ obj-$(subst y,$(CONFIG_IP_DCCP),$(CONFIG_IPV6)) += 
dccp_ipv6.o
 dccp_ipv6-y := ipv6.o
 
 obj-$(CONFIG_INET_DCCP_DIAG) += dccp_diag.o
-obj-$(CONFIG_NET_DCCPPROBE) += dccp_probe.o
 
 dccp-$(CONFIG_SYSCTL) += sysctl.o
 
 dccp_diag-y := diag.o
-dccp_probe-y := probe.o
 
 # build with local directory for trace.h
 CFLAGS_proto.o := -I$(src)
diff --git a/net/dccp/probe.c b/net/dccp/probe.c
deleted file mode 100644
index 3d3fda05b32d..
--- a/net/dccp/probe.c
+++ /dev/null
@@ -1,203 +0,0 @@
-/*
- * dccp_probe - Observe the DCCP flow with kprobes.
- *
- * The idea for this came from Werner Almesberger's umlsim
- * Copyright (C) 2004, Stephen Hemminger 
- *
- * Modified for DCCP from Stephen Hemminger's code
- * Copyright (C) 2006, Ian McDonald 
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include "dccp.h"
-#include "ccid.h"
-#include "ccids/ccid3.h"
-
-static int port;
-
-static int bufsize = 64 * 1024;
-
-static const char procname[] = "dccpprobe";
-
-static struct {
-   struct kfifo  fifo;
-   spinlock_tlock;
-   wait_queue_head_t wait;
-   struct timespec64 tstart;
-} dccpw;
-
-static void printl(const char *fmt, ...)
-{
-   va_list args;
-   int len;
-   struct timespec64 now;
-   char tbuf[256];
-
-   va_start(args, fmt);
-   getnstimeofday64();
-
-   now = timespec64_sub(now, dccpw.tstart);
-
-   len = sprintf(tbuf, "%lu.%06lu ",
- (unsigned long) now.tv_sec,
- (unsigned long) now.tv_nsec / NSEC_PER_USEC);
-   len += vscnprintf(tbuf+len, sizeof(tbuf)-len, fmt, args);
-   va_end(args);
-
-   kfifo_in_locked(, tbuf, len, );
-   wake_up();
-}
-
-static int jdccp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
-{
-   const struct inet_sock *inet = inet_sk(sk);
-   struct ccid3_hc_tx_sock *hc = NULL;
-
-   if (ccid_get_current_tx_ccid(dccp_sk(sk)) == DCCPC_CCID3)
-   hc = ccid3_hc_tx_sk(sk);
-
-   if (port == 0 || ntohs(inet->inet_dport) == port ||
-   ntohs(inet->inet_sport) == port) {
-   if (hc)
-   printl("%pI4:%u %pI4:%u %d %d %d %d %u %llu %llu %d\n",
-  >inet_saddr, ntohs(inet->inet_sport),
-  >inet_daddr, ntohs(inet->inet_dport), size,
-  hc->tx_s, hc->tx_rtt, hc->tx_p,
-  hc->tx_x_calc, hc->tx_x_recv >> 6,
-  hc->tx_x >> 6, hc->tx_t_ipi);
-   else
-   printl("%pI4:%u %pI4:%u %d\n",
-  >inet_saddr, 

[PATCH net-next v6 3/6] net: sctp: Add SCTP ACK tracking trace event

2017-12-27 Thread Masami Hiramatsu
Add SCTP ACK tracking trace event to trace the changes of SCTP
association state in response to incoming packets.
It is used for debugging SCTP congestion control algorithms,
and will replace sctp_probe module.

Note that this event a bit tricky. Since this consists of 2
events (sctp_probe and sctp_probe_path) so you have to enable
both events as below.

  # cd /sys/kernel/debug/tracing
  # echo 1 > events/sctp/sctp_probe/enable
  # echo 1 > events/sctp/sctp_probe_path/enable

Or, you can enable all the events under sctp.

  # echo 1 > events/sctp/enable

Since sctp_probe_path event is always invoked from sctp_probe
event, you can not see any output if you only enable
sctp_probe_path.

Signed-off-by: Masami Hiramatsu 
---
  Changes in v3:
   - Add checking whether sctp_probe_path event is enabled
 before iterating sctp paths to record. Thanks Steven.
  Changes in v4:
   - Move a temporal variable definition in the block.
   - Fix to cast pointer to unsigned long instead of __u64
 for 32bit environment.
---
 include/trace/events/sctp.h |   99 +++
 net/sctp/sm_statefuns.c |5 ++
 2 files changed, 104 insertions(+)
 create mode 100644 include/trace/events/sctp.h

diff --git a/include/trace/events/sctp.h b/include/trace/events/sctp.h
new file mode 100644
index ..7475c7be165a
--- /dev/null
+++ b/include/trace/events/sctp.h
@@ -0,0 +1,99 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM sctp
+
+#if !defined(_TRACE_SCTP_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_SCTP_H
+
+#include 
+#include 
+
+TRACE_EVENT(sctp_probe_path,
+
+   TP_PROTO(struct sctp_transport *sp,
+const struct sctp_association *asoc),
+
+   TP_ARGS(sp, asoc),
+
+   TP_STRUCT__entry(
+   __field(__u64, asoc)
+   __field(__u32, primary)
+   __array(__u8, ipaddr, sizeof(union sctp_addr))
+   __field(__u32, state)
+   __field(__u32, cwnd)
+   __field(__u32, ssthresh)
+   __field(__u32, flight_size)
+   __field(__u32, partial_bytes_acked)
+   __field(__u32, pathmtu)
+   ),
+
+   TP_fast_assign(
+   __entry->asoc = (unsigned long)asoc;
+   __entry->primary = (sp == asoc->peer.primary_path);
+   memcpy(__entry->ipaddr, >ipaddr, sizeof(union sctp_addr));
+   __entry->state = sp->state;
+   __entry->cwnd = sp->cwnd;
+   __entry->ssthresh = sp->ssthresh;
+   __entry->flight_size = sp->flight_size;
+   __entry->partial_bytes_acked = sp->partial_bytes_acked;
+   __entry->pathmtu = sp->pathmtu;
+   ),
+
+   TP_printk("asoc=%#llx%s ipaddr=%pISpc state=%u cwnd=%u ssthresh=%u "
+ "flight_size=%u partial_bytes_acked=%u pathmtu=%u",
+ __entry->asoc, __entry->primary ? "(*)" : "",
+ __entry->ipaddr, __entry->state, __entry->cwnd,
+ __entry->ssthresh, __entry->flight_size,
+ __entry->partial_bytes_acked, __entry->pathmtu)
+);
+
+TRACE_EVENT(sctp_probe,
+
+   TP_PROTO(const struct sctp_endpoint *ep,
+const struct sctp_association *asoc,
+struct sctp_chunk *chunk),
+
+   TP_ARGS(ep, asoc, chunk),
+
+   TP_STRUCT__entry(
+   __field(__u64, asoc)
+   __field(__u32, mark)
+   __field(__u16, bind_port)
+   __field(__u16, peer_port)
+   __field(__u32, pathmtu)
+   __field(__u32, rwnd)
+   __field(__u16, unack_data)
+   ),
+
+   TP_fast_assign(
+   struct sk_buff *skb = chunk->skb;
+
+   __entry->asoc = (unsigned long)asoc;
+   __entry->mark = skb->mark;
+   __entry->bind_port = ep->base.bind_addr.port;
+   __entry->peer_port = asoc->peer.port;
+   __entry->pathmtu = asoc->pathmtu;
+   __entry->rwnd = asoc->peer.rwnd;
+   __entry->unack_data = asoc->unack_data;
+
+   if (trace_sctp_probe_path_enabled()) {
+   struct sctp_transport *sp;
+
+   list_for_each_entry(sp, >peer.transport_addr_list,
+   transports) {
+   trace_sctp_probe_path(sp, asoc);
+   }
+   }
+   ),
+
+   TP_printk("asoc=%#llx mark=%#x bind_port=%d peer_port=%d pathmtu=%d "
+ "rwnd=%u unack_data=%d",
+ __entry->asoc, __entry->mark, __entry->bind_port,
+ __entry->peer_port, __entry->pathmtu, __entry->rwnd,
+ __entry->unack_data)
+);
+
+#endif /* _TRACE_SCTP_H */
+
+/* This part must be outside protection */
+#include 
diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index 541f34735346..eb7905ffe5f2 

[PATCH net-next v6 0/6] net: tcp: sctp: dccp: Replace jprobe usage with trace events

2017-12-27 Thread Masami Hiramatsu
Hi,

This series is v6 of the replacement of jprobe usage with trace
events. This version fixes trace/events/tcp.h to avoid sparse
warning, as same as Mat Martineau's patch.

Previous version is here;
 https://www.spinics.net/lists/netdev/msg474057.html

Changes from v5:
  [1/6]: Avoid preprocessor directives in tracepoint macro args

Thank you,

---

Masami Hiramatsu (6):
  net: tcp: Add trace events for TCP congestion window tracing
  net: tcp: Remove TCP probe module
  net: sctp: Add SCTP ACK tracking trace event
  net: sctp: Remove debug SCTP probe module
  net: dccp: Add DCCP sendmsg trace event
  net: dccp: Remove dccpprobe module


 include/trace/events/sctp.h |   99 ++
 include/trace/events/tcp.h  |   97 ++
 net/Kconfig |   17 --
 net/dccp/Kconfig|   17 --
 net/dccp/Makefile   |5 -
 net/dccp/probe.c|  203 -
 net/dccp/proto.c|5 +
 net/dccp/trace.h|  105 +++
 net/ipv4/Makefile   |1 
 net/ipv4/tcp_input.c|3 
 net/ipv4/tcp_probe.c|  301 ---
 net/sctp/Kconfig|   12 --
 net/sctp/Makefile   |3 
 net/sctp/probe.c|  244 ---
 net/sctp/sm_statefuns.c |5 +
 15 files changed, 317 insertions(+), 800 deletions(-)
 create mode 100644 include/trace/events/sctp.h
 delete mode 100644 net/dccp/probe.c
 create mode 100644 net/dccp/trace.h
 delete mode 100644 net/ipv4/tcp_probe.c
 delete mode 100644 net/sctp/probe.c

--
Masami Hiramatsu (Linaro) 


[PATCH net-next v6 0/6] net: tcp: sctp: dccp: Replace jprobe usage with trace events

2017-12-27 Thread Masami Hiramatsu
Hi,

This series is v6 of the replacement of jprobe usage with trace
events. This version fixes trace/events/tcp.h to avoid sparse
warning, as same as Mat Martineau's patch.

Previous version is here;
 https://www.spinics.net/lists/netdev/msg474057.html

Changes from v5:
  [1/6]: Avoid preprocessor directives in tracepoint macro args

Thank you,

---

Masami Hiramatsu (6):
  net: tcp: Add trace events for TCP congestion window tracing
  net: tcp: Remove TCP probe module
  net: sctp: Add SCTP ACK tracking trace event
  net: sctp: Remove debug SCTP probe module
  net: dccp: Add DCCP sendmsg trace event
  net: dccp: Remove dccpprobe module


 include/trace/events/sctp.h |   99 ++
 include/trace/events/tcp.h  |   97 ++
 net/Kconfig |   17 --
 net/dccp/Kconfig|   17 --
 net/dccp/Makefile   |5 -
 net/dccp/probe.c|  203 -
 net/dccp/proto.c|5 +
 net/dccp/trace.h|  105 +++
 net/ipv4/Makefile   |1 
 net/ipv4/tcp_input.c|3 
 net/ipv4/tcp_probe.c|  301 ---
 net/sctp/Kconfig|   12 --
 net/sctp/Makefile   |3 
 net/sctp/probe.c|  244 ---
 net/sctp/sm_statefuns.c |5 +
 15 files changed, 317 insertions(+), 800 deletions(-)
 create mode 100644 include/trace/events/sctp.h
 delete mode 100644 net/dccp/probe.c
 create mode 100644 net/dccp/trace.h
 delete mode 100644 net/ipv4/tcp_probe.c
 delete mode 100644 net/sctp/probe.c

--
Masami Hiramatsu (Linaro) 


Re: [PATCH 2/2] staging: iio: add spaces around '-' operator

2017-12-27 Thread Julia Lawall


On Wed, 27 Dec 2017, Ji-Hun Kim wrote:

> Clean up checkpatch warning:
> CHECK: spaces preferred around that '-' (ctx:VxV)
>
> Signed-off-by: Ji-Hun Kim 
> ---
>  drivers/staging/iio/adc/ad7192.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/staging/iio/adc/ad7192.c 
> b/drivers/staging/iio/adc/ad7192.c
> index f015955..c4eff71 100644
> --- a/drivers/staging/iio/adc/ad7192.c
> +++ b/drivers/staging/iio/adc/ad7192.c
> @@ -340,7 +340,7 @@ ad7192_show_scale_available(struct device *dev,
>  }
>
>  static IIO_DEVICE_ATTR_NAMED(in_v_m_v_scale_available,
> -  in_voltage-voltage_scale_available,
> +  in_voltage - voltage_scale_available,

I think this has been discussed at length before, and it is a hyphen not a
subtraction.  IIO_DEVICE_ATTR_NAMED is a macro, as indicated by the
capital letters, and you have to look and see what the code expands into.

julia

>0444, ad7192_show_scale_available, NULL, 0);
>
>  static IIO_DEVICE_ATTR(in_voltage_scale_available, 0444,
> --
> 2.10.1 (Apple Git-78)
>
> --
> To unsubscribe from this list: send the line "unsubscribe kernel-janitors" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


Re: [PATCH 2/2] staging: iio: add spaces around '-' operator

2017-12-27 Thread Julia Lawall


On Wed, 27 Dec 2017, Ji-Hun Kim wrote:

> Clean up checkpatch warning:
> CHECK: spaces preferred around that '-' (ctx:VxV)
>
> Signed-off-by: Ji-Hun Kim 
> ---
>  drivers/staging/iio/adc/ad7192.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/staging/iio/adc/ad7192.c 
> b/drivers/staging/iio/adc/ad7192.c
> index f015955..c4eff71 100644
> --- a/drivers/staging/iio/adc/ad7192.c
> +++ b/drivers/staging/iio/adc/ad7192.c
> @@ -340,7 +340,7 @@ ad7192_show_scale_available(struct device *dev,
>  }
>
>  static IIO_DEVICE_ATTR_NAMED(in_v_m_v_scale_available,
> -  in_voltage-voltage_scale_available,
> +  in_voltage - voltage_scale_available,

I think this has been discussed at length before, and it is a hyphen not a
subtraction.  IIO_DEVICE_ATTR_NAMED is a macro, as indicated by the
capital letters, and you have to look and see what the code expands into.

julia

>0444, ad7192_show_scale_available, NULL, 0);
>
>  static IIO_DEVICE_ATTR(in_voltage_scale_available, 0444,
> --
> 2.10.1 (Apple Git-78)
>
> --
> To unsubscribe from this list: send the line "unsubscribe kernel-janitors" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


Re: [PATCH 4.14 00/74] 4.14.10-stable review

2017-12-27 Thread Naresh Kamboju
On 27 December 2017 at 22:15, Greg Kroah-Hartman
 wrote:
> This is the start of the stable review cycle for the 4.14.10 release.
> There are 74 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri Dec 29 16:45:52 UTC 2017.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.10-rc1.gz
> or in the git tree and branch at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.14.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro’s test farm.
No regressions on arm64 and arm.
x86_64 build results will be shared soon in this email thread.

Summary


kernel: 4.14.10-rc1
git repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.14.y
git commit: 50f9cbb27797c6cf4d22ecb1eef0043a10a120cd
git describe: v4.14.9-75-g50f9cbb27797
Test details: 
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.14-oe/build/v4.14.9-75-g50f9cbb27797


No regressions (compared to build v4.14.9)

Boards, architectures and test suites:
-

hi6220-hikey - arm64
* boot - pass: 20,
* kselftest - pass: 46, skip: 16
* libhugetlbfs - pass: 90, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64,
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 60,
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 21, skip: 1
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 14,
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 983, skip: 121
* ltp-timers-tests - pass: 12,

juno-r2 - arm64
* boot - pass: 20,
* kselftest - pass: 45, skip: 17
* libhugetlbfs - pass: 90, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64,
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 60,
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 22,
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 14,
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 987, skip: 121
* ltp-timers-tests - pass: 12,

x15 - arm
* boot - pass: 20,
* kselftest - pass: 41, skip: 20
* libhugetlbfs - pass: 87, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64,
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 60,
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 20, skip: 2
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 13, skip: 1
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 1037, skip: 66
* ltp-timers-tests - pass: 12,

Documentation - https://collaborate.linaro.org/display/LKFT/Email+Reports
Tested-by: Naresh Kamboju 


Re: [PATCH 4.14 00/74] 4.14.10-stable review

2017-12-27 Thread Naresh Kamboju
On 27 December 2017 at 22:15, Greg Kroah-Hartman
 wrote:
> This is the start of the stable review cycle for the 4.14.10 release.
> There are 74 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri Dec 29 16:45:52 UTC 2017.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.10-rc1.gz
> or in the git tree and branch at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.14.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro’s test farm.
No regressions on arm64 and arm.
x86_64 build results will be shared soon in this email thread.

Summary


kernel: 4.14.10-rc1
git repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.14.y
git commit: 50f9cbb27797c6cf4d22ecb1eef0043a10a120cd
git describe: v4.14.9-75-g50f9cbb27797
Test details: 
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.14-oe/build/v4.14.9-75-g50f9cbb27797


No regressions (compared to build v4.14.9)

Boards, architectures and test suites:
-

hi6220-hikey - arm64
* boot - pass: 20,
* kselftest - pass: 46, skip: 16
* libhugetlbfs - pass: 90, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64,
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 60,
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 21, skip: 1
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 14,
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 983, skip: 121
* ltp-timers-tests - pass: 12,

juno-r2 - arm64
* boot - pass: 20,
* kselftest - pass: 45, skip: 17
* libhugetlbfs - pass: 90, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64,
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 60,
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 22,
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 14,
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 987, skip: 121
* ltp-timers-tests - pass: 12,

x15 - arm
* boot - pass: 20,
* kselftest - pass: 41, skip: 20
* libhugetlbfs - pass: 87, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64,
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 60,
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 20, skip: 2
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 13, skip: 1
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 1037, skip: 66
* ltp-timers-tests - pass: 12,

Documentation - https://collaborate.linaro.org/display/LKFT/Email+Reports
Tested-by: Naresh Kamboju 


Re: [PATCH] pinctrl: spear: Delete an error message for a failed memory allocation in spear_pinctrl_probe()

2017-12-27 Thread Viresh Kumar
On 27-12-17, 22:48, SF Markus Elfring wrote:
> From: Markus Elfring 
> Date: Wed, 27 Dec 2017 22:44:04 +0100
> 
> Omit an extra message for a memory allocation failure in this function.
> 
> This issue was detected by using the Coccinelle software.
> 
> Signed-off-by: Markus Elfring 
> ---
>  drivers/pinctrl/spear/pinctrl-spear.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/pinctrl/spear/pinctrl-spear.c 
> b/drivers/pinctrl/spear/pinctrl-spear.c
> index 4db52ba38d8d..efe79d3f7659 100644
> --- a/drivers/pinctrl/spear/pinctrl-spear.c
> +++ b/drivers/pinctrl/spear/pinctrl-spear.c
> @@ -361,10 +361,8 @@ int spear_pinctrl_probe(struct platform_device *pdev,
>   return -ENODEV;
>  
>   pmx = devm_kzalloc(>dev, sizeof(*pmx), GFP_KERNEL);
> - if (!pmx) {
> - dev_err(>dev, "Can't alloc spear_pmx\n");
> + if (!pmx)
>   return -ENOMEM;
> - }
>  
>   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>   pmx->vbase = devm_ioremap_resource(>dev, res);

Acked-by: Viresh Kumar 

-- 
viresh


Re: [PATCH] pinctrl: spear: Delete an error message for a failed memory allocation in spear_pinctrl_probe()

2017-12-27 Thread Viresh Kumar
On 27-12-17, 22:48, SF Markus Elfring wrote:
> From: Markus Elfring 
> Date: Wed, 27 Dec 2017 22:44:04 +0100
> 
> Omit an extra message for a memory allocation failure in this function.
> 
> This issue was detected by using the Coccinelle software.
> 
> Signed-off-by: Markus Elfring 
> ---
>  drivers/pinctrl/spear/pinctrl-spear.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/pinctrl/spear/pinctrl-spear.c 
> b/drivers/pinctrl/spear/pinctrl-spear.c
> index 4db52ba38d8d..efe79d3f7659 100644
> --- a/drivers/pinctrl/spear/pinctrl-spear.c
> +++ b/drivers/pinctrl/spear/pinctrl-spear.c
> @@ -361,10 +361,8 @@ int spear_pinctrl_probe(struct platform_device *pdev,
>   return -ENODEV;
>  
>   pmx = devm_kzalloc(>dev, sizeof(*pmx), GFP_KERNEL);
> - if (!pmx) {
> - dev_err(>dev, "Can't alloc spear_pmx\n");
> + if (!pmx)
>   return -ENOMEM;
> - }
>  
>   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>   pmx->vbase = devm_ioremap_resource(>dev, res);

Acked-by: Viresh Kumar 

-- 
viresh


Re: [PATCH] pinctrl/spear/plgpio: Delete two error messages for a failed memory allocation in plgpio_probe()

2017-12-27 Thread Viresh Kumar
On 27-12-17, 22:39, SF Markus Elfring wrote:
> From: Markus Elfring 
> Date: Wed, 27 Dec 2017 22:34:28 +0100
> 
> Omit extra messages for a memory allocation failure in this function.
> 
> This issue was detected by using the Coccinelle software.
> 
> Signed-off-by: Markus Elfring 
> ---
>  drivers/pinctrl/spear/pinctrl-plgpio.c | 8 ++--
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/pinctrl/spear/pinctrl-plgpio.c 
> b/drivers/pinctrl/spear/pinctrl-plgpio.c
> index 6a0ed8ab33b9..d2123e396b29 100644
> --- a/drivers/pinctrl/spear/pinctrl-plgpio.c
> +++ b/drivers/pinctrl/spear/pinctrl-plgpio.c
> @@ -519,10 +519,8 @@ static int plgpio_probe(struct platform_device *pdev)
>   int ret, irq;
>  
>   plgpio = devm_kzalloc(>dev, sizeof(*plgpio), GFP_KERNEL);
> - if (!plgpio) {
> - dev_err(>dev, "memory allocation fail\n");
> + if (!plgpio)
>   return -ENOMEM;
> - }
>  
>   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>   plgpio->base = devm_ioremap_resource(>dev, res);
> @@ -544,10 +542,8 @@ static int plgpio_probe(struct platform_device *pdev)
>   sizeof(*plgpio->csave_regs) *
>   DIV_ROUND_UP(plgpio->chip.ngpio, MAX_GPIO_PER_REG),
>   GFP_KERNEL);
> - if (!plgpio->csave_regs) {
> - dev_err(>dev, "csave registers memory allocation fail\n");
> + if (!plgpio->csave_regs)
>   return -ENOMEM;
> - }
>  #endif
>  
>   platform_set_drvdata(pdev, plgpio);

Acked-by: Viresh Kumar 

-- 
viresh


Re: [PATCH] pinctrl/spear/plgpio: Delete two error messages for a failed memory allocation in plgpio_probe()

2017-12-27 Thread Viresh Kumar
On 27-12-17, 22:39, SF Markus Elfring wrote:
> From: Markus Elfring 
> Date: Wed, 27 Dec 2017 22:34:28 +0100
> 
> Omit extra messages for a memory allocation failure in this function.
> 
> This issue was detected by using the Coccinelle software.
> 
> Signed-off-by: Markus Elfring 
> ---
>  drivers/pinctrl/spear/pinctrl-plgpio.c | 8 ++--
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/pinctrl/spear/pinctrl-plgpio.c 
> b/drivers/pinctrl/spear/pinctrl-plgpio.c
> index 6a0ed8ab33b9..d2123e396b29 100644
> --- a/drivers/pinctrl/spear/pinctrl-plgpio.c
> +++ b/drivers/pinctrl/spear/pinctrl-plgpio.c
> @@ -519,10 +519,8 @@ static int plgpio_probe(struct platform_device *pdev)
>   int ret, irq;
>  
>   plgpio = devm_kzalloc(>dev, sizeof(*plgpio), GFP_KERNEL);
> - if (!plgpio) {
> - dev_err(>dev, "memory allocation fail\n");
> + if (!plgpio)
>   return -ENOMEM;
> - }
>  
>   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>   plgpio->base = devm_ioremap_resource(>dev, res);
> @@ -544,10 +542,8 @@ static int plgpio_probe(struct platform_device *pdev)
>   sizeof(*plgpio->csave_regs) *
>   DIV_ROUND_UP(plgpio->chip.ngpio, MAX_GPIO_PER_REG),
>   GFP_KERNEL);
> - if (!plgpio->csave_regs) {
> - dev_err(>dev, "csave registers memory allocation fail\n");
> + if (!plgpio->csave_regs)
>   return -ENOMEM;
> - }
>  #endif
>  
>   platform_set_drvdata(pdev, plgpio);

Acked-by: Viresh Kumar 

-- 
viresh


DIRECTOR IN CHARGE: DR.PATRICE TEME

2017-12-27 Thread U N-Headquarters
UN Visitor Centre
Department of Public Information
United Nations Headquarters
Room DHL-1B-154
New York, NY 10017
E-mail:un...@teewars.org

United Nations Compensation Unit, In Affiliation with World Bank Our Ref: 
UN/WBO/042UK/2017.

Congratulations Beneficiary,

How are you today? Hope all is well with you and family? You may not understand 
why this mail came to you. We have been having a meeting for the past 7 months 
which just ended few days ago with the secretary to the UNITED NATIONS. This 
email is to all the people that have been scammed in any part of the world, the 
UNITED NATIONS in Affiliation with WORLD BANK have agreed to compensate them 
with the sum of USD US$980,000.00 Dollars.

This includes every foreign contractors that may have not received their 
contract sum, and people that have had an unfinished transaction or 
international businesses that failed due to Government problems etc. We found 
your name in the list of those who are to benefit from these compensation 
exercise and that is why we are contacting you, this have been agreed upon and 
have been signed. You are advised to contact Dr.PATRICE TEME of our paying 
center in Africa, as he is our representative in Nigeria, contact him 
immediately for your Cheque/ International Bank Draft of US$980,000.00 Dollars.

This fund is in form of a Bank Draft for security purpose ok? So he will send 
it to you and you can clear it in any bank of your choice. Therefore, you 
should send him your full Name and telephone number your correct mailing 
address where you want him to send the Draft to you. Contact Dr.PATRICE TEME of 
MAGNUM PLC PAYMENT CENTER with your payment code:ST/DPI/829 immediately for 
your Cheque at the given address below:

DIRECTOR IN CHARGE: DR.PATRICE TEME
E-MAIL:customerservice-magnumb...@smallinvestors.net
TELEPHONE:+ 234-817-008-4240
FAX: +234-817-008-4240

I apologize on behalf of my organization for any delay you might have 
encountered in receiving your fund in the past. Thanks and God bless you and 
your family. Hoping to hear from you as soon as you cash your Bank Draft. 
Making the world a better place.

You are required to contact the above person and furnish him with the following 
of your information that will be required to avoid any mistakes:-

1. Your Full Name :
2. Your Home/Mobile Telephone No:
3. Your Home or Office Address :
4. Age/Occupation/Marital Status:
5. Scanned copy of your identification:

Congratulations, and I look forward to hear from you as soon as you confirm 
your payment making the world a better place
 
VPTQEFPMBZJVFXNXYIXJBBOLMIQOJNMTOKZJMR


DIRECTOR IN CHARGE: DR.PATRICE TEME

2017-12-27 Thread U N-Headquarters
UN Visitor Centre
Department of Public Information
United Nations Headquarters
Room DHL-1B-154
New York, NY 10017
E-mail:un...@teewars.org

United Nations Compensation Unit, In Affiliation with World Bank Our Ref: 
UN/WBO/042UK/2017.

Congratulations Beneficiary,

How are you today? Hope all is well with you and family? You may not understand 
why this mail came to you. We have been having a meeting for the past 7 months 
which just ended few days ago with the secretary to the UNITED NATIONS. This 
email is to all the people that have been scammed in any part of the world, the 
UNITED NATIONS in Affiliation with WORLD BANK have agreed to compensate them 
with the sum of USD US$980,000.00 Dollars.

This includes every foreign contractors that may have not received their 
contract sum, and people that have had an unfinished transaction or 
international businesses that failed due to Government problems etc. We found 
your name in the list of those who are to benefit from these compensation 
exercise and that is why we are contacting you, this have been agreed upon and 
have been signed. You are advised to contact Dr.PATRICE TEME of our paying 
center in Africa, as he is our representative in Nigeria, contact him 
immediately for your Cheque/ International Bank Draft of US$980,000.00 Dollars.

This fund is in form of a Bank Draft for security purpose ok? So he will send 
it to you and you can clear it in any bank of your choice. Therefore, you 
should send him your full Name and telephone number your correct mailing 
address where you want him to send the Draft to you. Contact Dr.PATRICE TEME of 
MAGNUM PLC PAYMENT CENTER with your payment code:ST/DPI/829 immediately for 
your Cheque at the given address below:

DIRECTOR IN CHARGE: DR.PATRICE TEME
E-MAIL:customerservice-magnumb...@smallinvestors.net
TELEPHONE:+ 234-817-008-4240
FAX: +234-817-008-4240

I apologize on behalf of my organization for any delay you might have 
encountered in receiving your fund in the past. Thanks and God bless you and 
your family. Hoping to hear from you as soon as you cash your Bank Draft. 
Making the world a better place.

You are required to contact the above person and furnish him with the following 
of your information that will be required to avoid any mistakes:-

1. Your Full Name :
2. Your Home/Mobile Telephone No:
3. Your Home or Office Address :
4. Age/Occupation/Marital Status:
5. Scanned copy of your identification:

Congratulations, and I look forward to hear from you as soon as you confirm 
your payment making the world a better place
 
VPTQEFPMBZJVFXNXYIXJBBOLMIQOJNMTOKZJMR


Re: [PATCH] objtool: Fix clang enum conversion warning

2017-12-27 Thread Josh Poimboeuf
On Wed, Dec 27, 2017 at 10:42:45PM -0500, Nick Desaulniers wrote:
> On Wed, Dec 27, 2017 at 12:38 PM, Josh Poimboeuf  wrote:
> > On Wed, Dec 27, 2017 at 01:34:34PM +0100, Lukas Bulwahn wrote:
> >> Assuming that the authorship of this one-line change does not matter, as it
> >> is largely suggested by the clang compiler anyway, and we want to move the
> >> change forward, we should decide on which of three patches to move
> >> forward. I can give my Reviewed-by and Tested-by to any of them.
> 
> I suppose Ingo would take the first and accumulate Reviewed-By tags.
> I don't particularly care about authorship (please just fix the bug).
> 
> > The patch from Lukas was the first one I received, so that's the one I
> > used.  I rewrote the commit msg for clarity and added my SOB and sent it
> > to Ingo for merging.
> 
> I think you should have kept Nicholas Mc Guire's Reviewed by tag?
> Maybe Ingo can re-add his and mine when merging?

Yes, sorry, I missed that one.

Ingo, can you please add the following?

Reviewed-by: Nicholas Mc Guire 
Reviewed-by: Nick Desaulniers 

-- 
Josh


Re: [PATCH] objtool: Fix clang enum conversion warning

2017-12-27 Thread Josh Poimboeuf
On Wed, Dec 27, 2017 at 10:42:45PM -0500, Nick Desaulniers wrote:
> On Wed, Dec 27, 2017 at 12:38 PM, Josh Poimboeuf  wrote:
> > On Wed, Dec 27, 2017 at 01:34:34PM +0100, Lukas Bulwahn wrote:
> >> Assuming that the authorship of this one-line change does not matter, as it
> >> is largely suggested by the clang compiler anyway, and we want to move the
> >> change forward, we should decide on which of three patches to move
> >> forward. I can give my Reviewed-by and Tested-by to any of them.
> 
> I suppose Ingo would take the first and accumulate Reviewed-By tags.
> I don't particularly care about authorship (please just fix the bug).
> 
> > The patch from Lukas was the first one I received, so that's the one I
> > used.  I rewrote the commit msg for clarity and added my SOB and sent it
> > to Ingo for merging.
> 
> I think you should have kept Nicholas Mc Guire's Reviewed by tag?
> Maybe Ingo can re-add his and mine when merging?

Yes, sorry, I missed that one.

Ingo, can you please add the following?

Reviewed-by: Nicholas Mc Guire 
Reviewed-by: Nick Desaulniers 

-- 
Josh


Re: [f2fs-dev] [PATCH 1/2] f2fs: pass down write hints to block layer for bufferd write

2017-12-27 Thread Hyunchul Lee
Hi Jaegeuk,

On 12/28/2017 12:26 PM, Jaegeuk Kim wrote:
> On 12/23, Chao Yu wrote:
>> On 2017/12/15 10:06, Jaegeuk Kim wrote:
>>> On 12/14, Hyunchul Lee wrote:
 Hi Jaegeuk,

 I need your comment about the fs_iohint mount option.

 a) w/o fs_iohint, propagate user hints to low layer.
 b) w/ fs_iohint, ignore user hints, and use hints which is generated
 with F2FS.

 Chao suggests this option. because user hints are more accurate than
 file system.

 This is resonable, But I have some concerns about this option. 
 The first thing is that blocks of a segments have different hints. This
 could make GC less effective. 
 The second is that the separation between LIFE_MEDIUM and LIFE_LONG is 
 really needed. I think that difference between them is a little ambigous 
 for users, and LIFE_SHORT and LIFE_EXTREME is converted to different 
 hints by F2FS.
>>>
>>> I think what we really can do would assign many user hints to our 3 DATA
>>> logs likewise rw_hint_to_seg_type(), since it's just hints for user data.
>>> Then, we can decide how to keep that as much as possible, since we have
>>> another filesystem metadata such as meta and nodes. In addition, I don't
>>> think we have to keep the original user-hints which makes F2FS logs be
>>> messed up.
>>>
>>> With that mind, I can think of the below cases. Especially, if user wants
>>> to keep their io_hints, we'd better recommend to use direct_io w/o 
>>> fs_iohints.
>>
>>
>>
>>> In order to keep this policy, I think fs_iohints would be better to be a
>>> feature set by mkfs.f2fs and detected by sysfs entries for users.
>>>
>>> 1) w/ fs_iohints
>>>
>>> UserF2FS   Block
>>> ---
>>> Meta   WRITE_LIFE_MEDIUM
>>> HOT_NODE   WRITE_LIFE_NOTSET
>>> WARM_NODE  -'
>>> COLD_NODE  WRITE_LIFE_NONE
>>> ioctl(cold) COLD_DATA  WRITE_LIFE_EXTREME
>>> extention list  -' -'
>>> WRITE_LIFE_EXTREME  -' -'
>>> WRITE_LIFE_SHORTHOT_DATA   WRITE_LIFE_SHORT
>>>
>>> -- buffered_io
>>> WRITE_LIFE_NOT_SET  WARM_DATA  WRITE_LIFE_LONG
>>> WRITE_LIFE_NONE -' -'
>>> WRITE_LIFE_MEDIUM   -' -'
>>> WRITE_LIFE_LONG -' -'
>>>
>>> -- direct_io (Not recommendable)
>>> WRITE_LIFE_NOT_SET  WARM_DATA  WRITE_LIFE_NOT_SET
>>> WRITE_LIFE_NONE -' WRITE_LIFE_NONE
>>> WRITE_LIFE_MEDIUM   -' WRITE_LIFE_MEDIUM
>>> WRITE_LIFE_LONG -' WRITE_LIFE_LONG
>>
>> Agreed with above IO hint mapping rule.
>>
>>>
>>> 2) w/o fs_iohints
>>>
>>> UserF2FS   Block
>>> ---
>>> Meta   -
>>> HOT_NODE   -
>>> WARM_NODE  -
>>> COLD_NODE  -
>>> ioctl(cold) COLD_DATA  -
>>> extention list  -' -
>>>
>>> -- buffered_io
>>> WRITE_LIFE_EXTREME  COLD_DATA  -
>>> WRITE_LIFE_SHORTHOT_DATA   -
>>> WRITE_LIFE_NOT_SET  WARM_DATA  -
>>> WRITE_LIFE_NONE -' -
>>> WRITE_LIFE_MEDIUM   -' -
>>> WRITE_LIFE_LONG -' -
>>
>> Now we recommend direct_io if user wants to give IO hint for storage, I 
>> suspect
>> that user would suffer performance regression issue w/o buffered IO.
>>
>> Another problem is that, now, in Android, it will be very hard to prompt
>> application to migrate their IO pattern from buffered IO to direct IO, one
>> possible way is distinguishing user data lifetime from FWK, e.g. set
>> WRITE_LIFE_SHORT for cache file or tmp file, set WRITE_LIFE_EXTREME for 
>> media file.
>>
>> In order to support buffered_io, would it be better to change mapping as 
>> below?
>>
>> -- buffered_io
>> WRITE_LIFE_EXTREME  COLD_DATA  WRITE_LIFE_EXTREME
>> WRITE_LIFE_SHORTHOT_DATA   WRITE_LIFE_SHORT
>> WRITE_LIFE_NOT_SET  WARM_DATA  WRITE_LIFE_NOT_SET
>> WRITE_LIFE_NONE -' -'
>> WRITE_LIFE_MEDIUM   -' -'
>> WRITE_LIFE_LONG -' -'
> 
> Agreed, and it makes more sense that we'd better keep the write hints on
> userdata given by applications.
> 
> BTW, since we couldn't get any performance numbers with these, how about
> adding a mount option like "-o iohints=MODE" where MODE may be one of
> "fs-based", 

Re: [f2fs-dev] [PATCH 1/2] f2fs: pass down write hints to block layer for bufferd write

2017-12-27 Thread Hyunchul Lee
Hi Jaegeuk,

On 12/28/2017 12:26 PM, Jaegeuk Kim wrote:
> On 12/23, Chao Yu wrote:
>> On 2017/12/15 10:06, Jaegeuk Kim wrote:
>>> On 12/14, Hyunchul Lee wrote:
 Hi Jaegeuk,

 I need your comment about the fs_iohint mount option.

 a) w/o fs_iohint, propagate user hints to low layer.
 b) w/ fs_iohint, ignore user hints, and use hints which is generated
 with F2FS.

 Chao suggests this option. because user hints are more accurate than
 file system.

 This is resonable, But I have some concerns about this option. 
 The first thing is that blocks of a segments have different hints. This
 could make GC less effective. 
 The second is that the separation between LIFE_MEDIUM and LIFE_LONG is 
 really needed. I think that difference between them is a little ambigous 
 for users, and LIFE_SHORT and LIFE_EXTREME is converted to different 
 hints by F2FS.
>>>
>>> I think what we really can do would assign many user hints to our 3 DATA
>>> logs likewise rw_hint_to_seg_type(), since it's just hints for user data.
>>> Then, we can decide how to keep that as much as possible, since we have
>>> another filesystem metadata such as meta and nodes. In addition, I don't
>>> think we have to keep the original user-hints which makes F2FS logs be
>>> messed up.
>>>
>>> With that mind, I can think of the below cases. Especially, if user wants
>>> to keep their io_hints, we'd better recommend to use direct_io w/o 
>>> fs_iohints.
>>
>>
>>
>>> In order to keep this policy, I think fs_iohints would be better to be a
>>> feature set by mkfs.f2fs and detected by sysfs entries for users.
>>>
>>> 1) w/ fs_iohints
>>>
>>> UserF2FS   Block
>>> ---
>>> Meta   WRITE_LIFE_MEDIUM
>>> HOT_NODE   WRITE_LIFE_NOTSET
>>> WARM_NODE  -'
>>> COLD_NODE  WRITE_LIFE_NONE
>>> ioctl(cold) COLD_DATA  WRITE_LIFE_EXTREME
>>> extention list  -' -'
>>> WRITE_LIFE_EXTREME  -' -'
>>> WRITE_LIFE_SHORTHOT_DATA   WRITE_LIFE_SHORT
>>>
>>> -- buffered_io
>>> WRITE_LIFE_NOT_SET  WARM_DATA  WRITE_LIFE_LONG
>>> WRITE_LIFE_NONE -' -'
>>> WRITE_LIFE_MEDIUM   -' -'
>>> WRITE_LIFE_LONG -' -'
>>>
>>> -- direct_io (Not recommendable)
>>> WRITE_LIFE_NOT_SET  WARM_DATA  WRITE_LIFE_NOT_SET
>>> WRITE_LIFE_NONE -' WRITE_LIFE_NONE
>>> WRITE_LIFE_MEDIUM   -' WRITE_LIFE_MEDIUM
>>> WRITE_LIFE_LONG -' WRITE_LIFE_LONG
>>
>> Agreed with above IO hint mapping rule.
>>
>>>
>>> 2) w/o fs_iohints
>>>
>>> UserF2FS   Block
>>> ---
>>> Meta   -
>>> HOT_NODE   -
>>> WARM_NODE  -
>>> COLD_NODE  -
>>> ioctl(cold) COLD_DATA  -
>>> extention list  -' -
>>>
>>> -- buffered_io
>>> WRITE_LIFE_EXTREME  COLD_DATA  -
>>> WRITE_LIFE_SHORTHOT_DATA   -
>>> WRITE_LIFE_NOT_SET  WARM_DATA  -
>>> WRITE_LIFE_NONE -' -
>>> WRITE_LIFE_MEDIUM   -' -
>>> WRITE_LIFE_LONG -' -
>>
>> Now we recommend direct_io if user wants to give IO hint for storage, I 
>> suspect
>> that user would suffer performance regression issue w/o buffered IO.
>>
>> Another problem is that, now, in Android, it will be very hard to prompt
>> application to migrate their IO pattern from buffered IO to direct IO, one
>> possible way is distinguishing user data lifetime from FWK, e.g. set
>> WRITE_LIFE_SHORT for cache file or tmp file, set WRITE_LIFE_EXTREME for 
>> media file.
>>
>> In order to support buffered_io, would it be better to change mapping as 
>> below?
>>
>> -- buffered_io
>> WRITE_LIFE_EXTREME  COLD_DATA  WRITE_LIFE_EXTREME
>> WRITE_LIFE_SHORTHOT_DATA   WRITE_LIFE_SHORT
>> WRITE_LIFE_NOT_SET  WARM_DATA  WRITE_LIFE_NOT_SET
>> WRITE_LIFE_NONE -' -'
>> WRITE_LIFE_MEDIUM   -' -'
>> WRITE_LIFE_LONG -' -'
> 
> Agreed, and it makes more sense that we'd better keep the write hints on
> userdata given by applications.
> 
> BTW, since we couldn't get any performance numbers with these, how about
> adding a mount option like "-o iohints=MODE" where MODE may be one of
> "fs-based", 

Re: [PATCH v2] gpio: winbond: add driver

2017-12-27 Thread William Breathitt Gray
On Wed, Dec 27, 2017 at 07:42:21PM +0100, Maciej S. Szmigiero wrote:
>On 27.12.2017 01:24, William Breathitt Gray wrote:
>> On Mon, Dec 25, 2017 at 03:48:16PM +0100, Maciej S. Szmigiero wrote:
>(..)
>>> All the existing ISA bus drivers seem to depend on CONFIG_ISA_BUS_API
>>> instead of selecting it but IMHO this is wrong because:
>>> 1) This Kconfig option doesn't really enable or disable any bus support
>>> but building of a library of some common boilerplate code.
>>> Libraries are normally selected by drivers needing them and only provided
>>> as an user-selectable option if there is a possibility that a out-of-tree
>>> module would need it,
>>>
>>> 2) On x86_64 this option (or rather, its parent option CONFIG_ISA_BUS)
>>> cannot be enabled without CONFIG_EXPERT,
>>>
>>> 3) This device isn't really a ISA bus device any more than, for example,
>>> a 8250 serial port or a PC-style parallel port and these don't need
>>> that an user explicitly enables "ISA bus support" in his kernel
>>> configuration.
>> 
>> I can see what you mean about selecting ISA_BUS_API rather than having
>> it as a dependency for drivers. Part of the reason I added the
>> CONFIG_EXPERT dependency for CONFIG_ISA_BUS -- as well as having
>> CONFIG_ISA_BUS_API be a dependency for the drivers themselves -- was to
>> hide the ISA-style drivers which blindly poke at I/O port addresses,
>> lest a niave user enable all available drivers and unintentionally brick
>> their system when the drivers execute.
>> 
>> I think there is still merit in masking dangerous drivers such as this,
>> since the expected behavior nowadays is for the driver to probe for the
>> device before poking at memory; since ISA-style communication lacks a
>> standard method of detecting devices in hardware, these devices
>> generally pose a danger when loaded by niave users.
>
>This driver accesses the same Super I/O chip as w83627ehf hwmon and
>w83627hf_wdt watchdog drivers.
>In addition to this, there are loads of other hwmon and watchdog drivers
>for x86 Super I/Os in the tree, most of them using the same probing and
>communication style.
>There are even existing GPIO drivers for some Super I/Os like gpio-it87
>and gpio-f7188x.
>
>None of these drivers need CONFIG_EXPERT to be selected.
>
>Also, CONFIG_EXPERT is described as "Configure standard kernel features"
>and that "[it] allows certain base kernel options and settings to be
>disabled or tweaked" for "specialized environments".
>Enabling this driver is not about changing "standard kernel feature" or
>a "base kernel option [or] setting".

I'm sorry, I didn't make it quite clear in my previous reply. I agree
with you that CONFIG_EXPERT shouldn't be necessary for this driver -- in
the end, a select ISA_BUS_API line should be all that's needed to have
ISA bus driver support for your driver.

My reference to the CONFIG_EXPERT option is for masking options related
to other ISA-style buses not commonly found in desktop systems. Devices
like the Super I/O chip wouldn't fall into this category since LPC is
pretty common and the relevant I/O port addresses are usually well
known. Ultimately, the Winbond GPIO driver should not need CONFIG_EXPERT
to be enabled in order to select ISA_BUS_API.

William Breathitt Gray

>> 
>> William Breathitt Gray
>
>Maciej Szmigiero


Re: [PATCH v2] gpio: winbond: add driver

2017-12-27 Thread William Breathitt Gray
On Wed, Dec 27, 2017 at 07:42:21PM +0100, Maciej S. Szmigiero wrote:
>On 27.12.2017 01:24, William Breathitt Gray wrote:
>> On Mon, Dec 25, 2017 at 03:48:16PM +0100, Maciej S. Szmigiero wrote:
>(..)
>>> All the existing ISA bus drivers seem to depend on CONFIG_ISA_BUS_API
>>> instead of selecting it but IMHO this is wrong because:
>>> 1) This Kconfig option doesn't really enable or disable any bus support
>>> but building of a library of some common boilerplate code.
>>> Libraries are normally selected by drivers needing them and only provided
>>> as an user-selectable option if there is a possibility that a out-of-tree
>>> module would need it,
>>>
>>> 2) On x86_64 this option (or rather, its parent option CONFIG_ISA_BUS)
>>> cannot be enabled without CONFIG_EXPERT,
>>>
>>> 3) This device isn't really a ISA bus device any more than, for example,
>>> a 8250 serial port or a PC-style parallel port and these don't need
>>> that an user explicitly enables "ISA bus support" in his kernel
>>> configuration.
>> 
>> I can see what you mean about selecting ISA_BUS_API rather than having
>> it as a dependency for drivers. Part of the reason I added the
>> CONFIG_EXPERT dependency for CONFIG_ISA_BUS -- as well as having
>> CONFIG_ISA_BUS_API be a dependency for the drivers themselves -- was to
>> hide the ISA-style drivers which blindly poke at I/O port addresses,
>> lest a niave user enable all available drivers and unintentionally brick
>> their system when the drivers execute.
>> 
>> I think there is still merit in masking dangerous drivers such as this,
>> since the expected behavior nowadays is for the driver to probe for the
>> device before poking at memory; since ISA-style communication lacks a
>> standard method of detecting devices in hardware, these devices
>> generally pose a danger when loaded by niave users.
>
>This driver accesses the same Super I/O chip as w83627ehf hwmon and
>w83627hf_wdt watchdog drivers.
>In addition to this, there are loads of other hwmon and watchdog drivers
>for x86 Super I/Os in the tree, most of them using the same probing and
>communication style.
>There are even existing GPIO drivers for some Super I/Os like gpio-it87
>and gpio-f7188x.
>
>None of these drivers need CONFIG_EXPERT to be selected.
>
>Also, CONFIG_EXPERT is described as "Configure standard kernel features"
>and that "[it] allows certain base kernel options and settings to be
>disabled or tweaked" for "specialized environments".
>Enabling this driver is not about changing "standard kernel feature" or
>a "base kernel option [or] setting".

I'm sorry, I didn't make it quite clear in my previous reply. I agree
with you that CONFIG_EXPERT shouldn't be necessary for this driver -- in
the end, a select ISA_BUS_API line should be all that's needed to have
ISA bus driver support for your driver.

My reference to the CONFIG_EXPERT option is for masking options related
to other ISA-style buses not commonly found in desktop systems. Devices
like the Super I/O chip wouldn't fall into this category since LPC is
pretty common and the relevant I/O port addresses are usually well
known. Ultimately, the Winbond GPIO driver should not need CONFIG_EXPERT
to be enabled in order to select ISA_BUS_API.

William Breathitt Gray

>> 
>> William Breathitt Gray
>
>Maciej Szmigiero


Re: [PATCH V8 3/3] OPP: Allow "opp-hz" and "opp-microvolt" to contain magic values

2017-12-27 Thread Viresh Kumar
On 27-12-17, 15:54, Rob Herring wrote:
> On Wed, Dec 27, 2017 at 2:56 AM, Viresh Kumar  wrote:
> > On 26-12-17, 14:29, Rob Herring wrote:
> >> On Mon, Dec 18, 2017 at 03:51:30PM +0530, Viresh Kumar wrote:
> >
> >> > +On some platforms the exact frequency or voltage may be hidden from the 
> >> > OS by
> >> > +the firmware and the "opp-hz" or the "opp-microvolt" properties may 
> >> > contain
> >> > +magic values that represent the frequency or voltage in a firmware 
> >> > dependent
> >> > +way, for example an index of an array in the firmware.
> >>
> >> I'm still not convinced this is a good idea.
> >
> > You were kind-of a few days back :)
> >
> > lkml.kernel.org/r/CAL_JsqK-qtAaM_Ou5NtxcWR3F_q=8rmpjum-vqgtkhbtwe5...@mail.gmail.com
> 
> Yeah, well that was before Stephen said anything.
> 
> > So here is the deal:
> >
> > - I proposed "domain-performance-state" property for this stuff
> >   initially.
> > - But Kevin didn't like that and proposed reusing "opp-hz" and
> >   "opp-microvolt", which we all agreed to multiple times..
> > - And we are back to the same discussion now and its painful and time
> >   killing for all of us.
> 
> There's bigger issues than where we put magic values as I raised in
> the other patch.
> 
> > TBH, I don't have too strong preferences about any of the suggestions
> > you guys have and I need you guys to tell me what binding changes to
> > do here and I will do that.
> >
> >> If you have firmware
> >> partially managing things, then I think we should have platform specific
> >> bindings or drivers.
> >
> > What about the initial idea then, like "performance-state" for the
> > power domains ? All platforms will anyway replicate that binding only.
> 
> I don't really know. I don't really care either. I'll probably go
> along with what everyone agrees to, but the only one I see any
> agreement from is Ulf. Also, it is pretty vague as to what platforms
> will use this. You claimed you can support QCom scenarios, but there's
> really no evidence that that is true.

Well, I sent out the code few days back based on these bindings and everyone can
see how these bindings will get used now.

> What I don't want to see is this
> merged and then we need something more yet again in a few months for
> another platform.

Sure, I get your concerns.

So what we need now is:

- Stephen to start responding and clarify all the doubts he had as being silent
  isn't helping.

- Or Rajendra to post patches which can prove that this is usable. The last time
  I had a chat with him, he confirmed that he will post patches after 4.15-rc1
  and he should have posted them by now, but he didn't :(

-- 
viresh


Re: [PATCH V8 3/3] OPP: Allow "opp-hz" and "opp-microvolt" to contain magic values

2017-12-27 Thread Viresh Kumar
On 27-12-17, 15:54, Rob Herring wrote:
> On Wed, Dec 27, 2017 at 2:56 AM, Viresh Kumar  wrote:
> > On 26-12-17, 14:29, Rob Herring wrote:
> >> On Mon, Dec 18, 2017 at 03:51:30PM +0530, Viresh Kumar wrote:
> >
> >> > +On some platforms the exact frequency or voltage may be hidden from the 
> >> > OS by
> >> > +the firmware and the "opp-hz" or the "opp-microvolt" properties may 
> >> > contain
> >> > +magic values that represent the frequency or voltage in a firmware 
> >> > dependent
> >> > +way, for example an index of an array in the firmware.
> >>
> >> I'm still not convinced this is a good idea.
> >
> > You were kind-of a few days back :)
> >
> > lkml.kernel.org/r/CAL_JsqK-qtAaM_Ou5NtxcWR3F_q=8rmpjum-vqgtkhbtwe5...@mail.gmail.com
> 
> Yeah, well that was before Stephen said anything.
> 
> > So here is the deal:
> >
> > - I proposed "domain-performance-state" property for this stuff
> >   initially.
> > - But Kevin didn't like that and proposed reusing "opp-hz" and
> >   "opp-microvolt", which we all agreed to multiple times..
> > - And we are back to the same discussion now and its painful and time
> >   killing for all of us.
> 
> There's bigger issues than where we put magic values as I raised in
> the other patch.
> 
> > TBH, I don't have too strong preferences about any of the suggestions
> > you guys have and I need you guys to tell me what binding changes to
> > do here and I will do that.
> >
> >> If you have firmware
> >> partially managing things, then I think we should have platform specific
> >> bindings or drivers.
> >
> > What about the initial idea then, like "performance-state" for the
> > power domains ? All platforms will anyway replicate that binding only.
> 
> I don't really know. I don't really care either. I'll probably go
> along with what everyone agrees to, but the only one I see any
> agreement from is Ulf. Also, it is pretty vague as to what platforms
> will use this. You claimed you can support QCom scenarios, but there's
> really no evidence that that is true.

Well, I sent out the code few days back based on these bindings and everyone can
see how these bindings will get used now.

> What I don't want to see is this
> merged and then we need something more yet again in a few months for
> another platform.

Sure, I get your concerns.

So what we need now is:

- Stephen to start responding and clarify all the doubts he had as being silent
  isn't helping.

- Or Rajendra to post patches which can prove that this is usable. The last time
  I had a chat with him, he confirmed that he will post patches after 4.15-rc1
  and he should have posted them by now, but he didn't :(

-- 
viresh


Re: [PATCH] Documentation: gpu/i915: rename intel_guc_loader.c

2017-12-27 Thread Joey Pabalinas
On Wed, Dec 27, 2017 at 08:19:49PM -0800, Randy Dunlap wrote:
> https://marc.info/?l=dri-devel=151237442131113=2
> 
> Already applied to drm/ tree.

Gotcha; cheers.

-- 
Joey Pabalinas


signature.asc
Description: PGP signature


Re: [PATCH] Documentation: gpu/i915: rename intel_guc_loader.c

2017-12-27 Thread Joey Pabalinas
On Wed, Dec 27, 2017 at 08:19:49PM -0800, Randy Dunlap wrote:
> https://marc.info/?l=dri-devel=151237442131113=2
> 
> Already applied to drm/ tree.

Gotcha; cheers.

-- 
Joey Pabalinas


signature.asc
Description: PGP signature


Re: [PATCH v2] gpio: winbond: add driver

2017-12-27 Thread William Breathitt Gray
On Wed, Dec 27, 2017 at 12:16:53PM +0100, Linus Walleij wrote:
>On Wed, Dec 27, 2017 at 1:24 AM, William Breathitt Gray
> wrote:
>
>> The issue seems to relate to the select GPIOLIB line for the STX104
>> Kconfig option (which has a ISA_BUS_API dependency). Switching GPIOLIB
>> to be a dependency, or alternatively selecting ISA_BUS_API, alleviates
>> the recursion.
>>
>> Linus, is my use of select GPIOLIB for the STX104 Kconfig option
>> appropriate in this context -- or should it instead be part of the
>> depends on line? The STX104 driver includes linux/gpio/driver.h and
>> makes use of the devm_gpiochip_add_data function to add support for some
>> minor auxililary GPIO lines on the STX104 device.
>
>In the STX104 case, it seems to be appropriate to
>select GPIOLIB, as it is a GPIO provider, not consumer.
>
>Usually I prefer that drivers just select what they need so I don't
>have to run around in the whole kernel tree and turn things on
>to the left and right before I can finally select my driver, but
>maybe that is just me.
>
>The other ISA GPIO drivers depends on ISA_BUS_API, I guess
>in difference from the symbol GPIOLIB it cannot be universally
>selected, so shouldn't this driver also just depends on ISA_BUS_API
>and select it from the machine or wherever?

When I initially submitted the PC/104 device drivers, I added
ISA_BUS_API to their depends on lines in order to mask the respective
drivers from users who do not have a PC/104 bus on their system; in
retrospect, I shouldn't have done it this way. I later added a proper
CONFIG_PC104 option to achieve the masking I initially intended.

The correct semantic would be to treat ISA_BUS_API as we do GPIOLIB:
allow drivers to select it when needed. The reason is that
CONFIG_ISA_BUS_API simply enables the compilation of the
drivers/base/isa.c file. This file has no other Kconfig dependencies and
simply provides a thin layer of code to eliminate some boilerplate
common to ISA-style device drivers; no hardware interactions occur
within this code, just pure software abstractions.

Given that CONFIG_PC104 now fulfills the original purpose of adding
ISA_BUS_API to the depends on line for the PC/104 device drivers, and
that ISA_BUS_API can be selected as necessary with no danger to the
integrity of the system, I think it would be appropriate to change all
the existing ISA_BUS_API depends on lines to respective select lines.
Does that logic make sense?

William Breathitt Gray

>
>Yours,
>Linus Walleij


Re: [PATCH v2] gpio: winbond: add driver

2017-12-27 Thread William Breathitt Gray
On Wed, Dec 27, 2017 at 12:16:53PM +0100, Linus Walleij wrote:
>On Wed, Dec 27, 2017 at 1:24 AM, William Breathitt Gray
> wrote:
>
>> The issue seems to relate to the select GPIOLIB line for the STX104
>> Kconfig option (which has a ISA_BUS_API dependency). Switching GPIOLIB
>> to be a dependency, or alternatively selecting ISA_BUS_API, alleviates
>> the recursion.
>>
>> Linus, is my use of select GPIOLIB for the STX104 Kconfig option
>> appropriate in this context -- or should it instead be part of the
>> depends on line? The STX104 driver includes linux/gpio/driver.h and
>> makes use of the devm_gpiochip_add_data function to add support for some
>> minor auxililary GPIO lines on the STX104 device.
>
>In the STX104 case, it seems to be appropriate to
>select GPIOLIB, as it is a GPIO provider, not consumer.
>
>Usually I prefer that drivers just select what they need so I don't
>have to run around in the whole kernel tree and turn things on
>to the left and right before I can finally select my driver, but
>maybe that is just me.
>
>The other ISA GPIO drivers depends on ISA_BUS_API, I guess
>in difference from the symbol GPIOLIB it cannot be universally
>selected, so shouldn't this driver also just depends on ISA_BUS_API
>and select it from the machine or wherever?

When I initially submitted the PC/104 device drivers, I added
ISA_BUS_API to their depends on lines in order to mask the respective
drivers from users who do not have a PC/104 bus on their system; in
retrospect, I shouldn't have done it this way. I later added a proper
CONFIG_PC104 option to achieve the masking I initially intended.

The correct semantic would be to treat ISA_BUS_API as we do GPIOLIB:
allow drivers to select it when needed. The reason is that
CONFIG_ISA_BUS_API simply enables the compilation of the
drivers/base/isa.c file. This file has no other Kconfig dependencies and
simply provides a thin layer of code to eliminate some boilerplate
common to ISA-style device drivers; no hardware interactions occur
within this code, just pure software abstractions.

Given that CONFIG_PC104 now fulfills the original purpose of adding
ISA_BUS_API to the depends on line for the PC/104 device drivers, and
that ISA_BUS_API can be selected as necessary with no danger to the
integrity of the system, I think it would be appropriate to change all
the existing ISA_BUS_API depends on lines to respective select lines.
Does that logic make sense?

William Breathitt Gray

>
>Yours,
>Linus Walleij


Re: [RFC PATCH bpf-next v2 1/4] tracing/kprobe: bpf: Check error injectable event is on function entry

2017-12-27 Thread Alexei Starovoitov

On 12/27/17 8:16 PM, Steven Rostedt wrote:

On Wed, 27 Dec 2017 19:45:42 -0800
Alexei Starovoitov  wrote:


I don't think that's the case. My reading of current
trace_kprobe_ftrace() -> arch_check_ftrace_location()
is that it will not be true for old mcount case.


In the old mcount case, you can't use ftrace to return without calling
the function. That is, no modification of the return ip, unless you
created a trampoline that could handle arbitrary stack frames, and
remove them from the stack before returning back to the function.


correct. I was saying that trace_kprobe_ftrace() won't let us do
bpf_override_return with old mcount.



As far as the rest of your arguments it very much puzzles me that
you claim that this patch suppose to work based on historical
reasoning whereas you did NOT test it.


I believe that Masami is saying that the modification of the IP from
kprobes has been very well tested. But I'm guessing that you still want
a test case for using kprobes in this particular instance. It's not the
implementation of modifying the IP that you are worried about, but the
implementation of BPF using it in this case. Right?


exactly. No doubt that old code works.
But it doesn't mean that bpf_override_return() will continue to
work in kprobes that are not ftrace based.
I suspect Josef's existing test case will cover this situation.
Probably only special .config is needed to disable ftrace, so
"kprobe on entry but not ftrace" check will kick in.
But I didn't get an impression that this situation was tested.
Instead I see only logical reasoning that it's _supposed_ to work.
That's not enough.



Re: [RFC PATCH bpf-next v2 1/4] tracing/kprobe: bpf: Check error injectable event is on function entry

2017-12-27 Thread Alexei Starovoitov

On 12/27/17 8:16 PM, Steven Rostedt wrote:

On Wed, 27 Dec 2017 19:45:42 -0800
Alexei Starovoitov  wrote:


I don't think that's the case. My reading of current
trace_kprobe_ftrace() -> arch_check_ftrace_location()
is that it will not be true for old mcount case.


In the old mcount case, you can't use ftrace to return without calling
the function. That is, no modification of the return ip, unless you
created a trampoline that could handle arbitrary stack frames, and
remove them from the stack before returning back to the function.


correct. I was saying that trace_kprobe_ftrace() won't let us do
bpf_override_return with old mcount.



As far as the rest of your arguments it very much puzzles me that
you claim that this patch suppose to work based on historical
reasoning whereas you did NOT test it.


I believe that Masami is saying that the modification of the IP from
kprobes has been very well tested. But I'm guessing that you still want
a test case for using kprobes in this particular instance. It's not the
implementation of modifying the IP that you are worried about, but the
implementation of BPF using it in this case. Right?


exactly. No doubt that old code works.
But it doesn't mean that bpf_override_return() will continue to
work in kprobes that are not ftrace based.
I suspect Josef's existing test case will cover this situation.
Probably only special .config is needed to disable ftrace, so
"kprobe on entry but not ftrace" check will kick in.
But I didn't get an impression that this situation was tested.
Instead I see only logical reasoning that it's _supposed_ to work.
That's not enough.



Re: [RFC V7 2/2] OPP: Allow "opp-hz" and "opp-microvolt" to contain magic values

2017-12-27 Thread Viresh Kumar
On 27-12-17, 15:36, Rob Herring wrote:
> On Tue, Dec 26, 2017 at 10:45 PM, Viresh Kumar  
> wrote:
> > On 26-12-17, 14:23, Rob Herring wrote:
> >> > cpu_opp_table: cpu_opp_table {
> >> > compatible = "operating-points-v2";
> >> > opp-shared;
> >> >
> >> > opp00 {
> >> > opp-hz = /bits/ 64 <20800>;
> >> > clock-latency-ns = <50>;
> >> > power-domain-opp = <_opp_1>;
> >>
> >> What is this? opp00 here is not a device. One OPP should not point to
> >> another. "power-domain-opp" is only supposed to appear in devices
> >> alongside power-domains properties.
> >
> > There are two type of devices:
> >
> > A.) With fixed performance state requirements and they will have the
> > new "required-opp" property in the device node itself as you said.
> >
> > B.) Devices which can do DVFS (CPU, MMC, LCD, etc) and those may need
> > a different performance state of the domain for their individual OPPs
> > and so we can't have this property in the device all the time.
> >
> > Does this make sense ?
> 
> No. From the definition for power-domain-opp
> 
> "+- power-domain-opp: This contains phandle to one of the OPP nodes of
> the master
> +  power domain. This specifies the minimum required OPP of the master
> domain for
> +  the functioning of the device in this OPP (where this property is present).

The per-opp thing was mentioned here.

> +  This property can only be set for a device if the device node contains the
> +  "power-domains" property.

This was trying to say something else, though it wasn't clear and so your
concerns.

I wanted to say that the device node or its OPP nodes can have the
"power-domain-opp" property only if the device node has a "power-domains"
property. i.e. you need to have power domain first and then only the
power-domain-opp property.

> Also, either all or none of the OPP nodes in an OPP
> +  table should have it set."
> 
> In the above example, you are violating the next to last sentence.
> 
> Though, I'm now confused by what the last sentence means.

Yeah, lets leave it as is as the V8 has changed this significantly and you
already Acked it :)

-- 
viresh


Re: [RFC V7 2/2] OPP: Allow "opp-hz" and "opp-microvolt" to contain magic values

2017-12-27 Thread Viresh Kumar
On 27-12-17, 15:36, Rob Herring wrote:
> On Tue, Dec 26, 2017 at 10:45 PM, Viresh Kumar  
> wrote:
> > On 26-12-17, 14:23, Rob Herring wrote:
> >> > cpu_opp_table: cpu_opp_table {
> >> > compatible = "operating-points-v2";
> >> > opp-shared;
> >> >
> >> > opp00 {
> >> > opp-hz = /bits/ 64 <20800>;
> >> > clock-latency-ns = <50>;
> >> > power-domain-opp = <_opp_1>;
> >>
> >> What is this? opp00 here is not a device. One OPP should not point to
> >> another. "power-domain-opp" is only supposed to appear in devices
> >> alongside power-domains properties.
> >
> > There are two type of devices:
> >
> > A.) With fixed performance state requirements and they will have the
> > new "required-opp" property in the device node itself as you said.
> >
> > B.) Devices which can do DVFS (CPU, MMC, LCD, etc) and those may need
> > a different performance state of the domain for their individual OPPs
> > and so we can't have this property in the device all the time.
> >
> > Does this make sense ?
> 
> No. From the definition for power-domain-opp
> 
> "+- power-domain-opp: This contains phandle to one of the OPP nodes of
> the master
> +  power domain. This specifies the minimum required OPP of the master
> domain for
> +  the functioning of the device in this OPP (where this property is present).

The per-opp thing was mentioned here.

> +  This property can only be set for a device if the device node contains the
> +  "power-domains" property.

This was trying to say something else, though it wasn't clear and so your
concerns.

I wanted to say that the device node or its OPP nodes can have the
"power-domain-opp" property only if the device node has a "power-domains"
property. i.e. you need to have power domain first and then only the
power-domain-opp property.

> Also, either all or none of the OPP nodes in an OPP
> +  table should have it set."
> 
> In the above example, you are violating the next to last sentence.
> 
> Though, I'm now confused by what the last sentence means.

Yeah, lets leave it as is as the V8 has changed this significantly and you
already Acked it :)

-- 
viresh


Re: [PATCH 04/12] ext2: drop unneeded newline

2017-12-27 Thread Theodore Ts'o
On Wed, Dec 27, 2017 at 03:51:37PM +0100, Julia Lawall wrote:
> ext2_msg prints a newline at the end of the message string, so the message
> string does not need to include a newline explicitly.  Done using
> Coccinelle.
> 
> Signed-off-by: Julia Lawall 

Reviewed-by: Theodore Ts'o 

- Ted


Re: [PATCH 04/12] ext2: drop unneeded newline

2017-12-27 Thread Theodore Ts'o
On Wed, Dec 27, 2017 at 03:51:37PM +0100, Julia Lawall wrote:
> ext2_msg prints a newline at the end of the message string, so the message
> string does not need to include a newline explicitly.  Done using
> Coccinelle.
> 
> Signed-off-by: Julia Lawall 

Reviewed-by: Theodore Ts'o 

- Ted


Re: [RFC PATCH v11 4/5] PCI / PM: Add support for the PCIe WAKE# signal for OF

2017-12-27 Thread Tony Lindgren
* Rafael J. Wysocki  [171228 00:51]:
> On Wed, Dec 27, 2017 at 4:08 PM, Tony Lindgren  wrote:
> > * Rafael J. Wysocki  [171227 01:00]:
> >> On Tuesday, December 26, 2017 2:06:47 AM CET JeffyChen wrote:
> >> > Hi Rafael,
> >> >
> >> > Thanks for your reply :)
> >> >
> >> > On 12/26/2017 08:11 AM, Rafael J. Wysocki wrote:
> >> > >> >+
> >> > >> >+   dn = pci_device_to_OF_node(ppdev);
> >> > >> >+   if (!dn)
> >> > >> >+   return 0;
> >> > >> >+
> >> > >> >+   irq = of_irq_get_byname(dn, "wakeup");
> >> > > Why is this a property of the bridge and not of the device itself?
> >> >
> >> > That is suggested by Brian, because in that way, the wakeup pin would
> >> > not "tied to what exact device is installed (or no device, if it's a 
> >> > slot)."
> >>
> >> But I don't think it works when there are two devices using different WAKE#
> >> interrupt lines under the same bridge.  Or how does it work then?
> >
> > It won't work currently for multiple devices but adding more than
> > one wakeriq per device is doable. And I think we will have other
> > cases where multiple wakeirqs are connected to a single device, so
> > that issue should be sorted out sooner or later.
> >
> > And if requesting wakeirq for the PCI WAKE# lines at the PCI
> > controller does the job, then maybe that's all we need to start with.
> 
> These are expected to be out-of-band, so not having anything to do
> with the Root Complex.
> 
> In-band PME Messages go through the PCIe hierarchy, but that is a
> standard mechanism and it is supported already.
> 
> WAKE# are platform-specific, pretty much by definition and I guess
> that on most ARM boards they are just going to be some kind of GPIO
> pins.

OK. So probably supporting the following two configurations
should be enough then:

1. One or more WAKE# lines configured as a wakeirq for the PCI
   controller

   When the wakeirq calls pm_wakeup_event() for the PCI controller
   device driver, the PCI controller wakes up and can deal with
   it's child devices

2. Optionally a WAKE# line from a PCI device configured as wakeirq
   for the PCI device driver

   In this case calling the PM runtime resume in the child
   PCI device will also wake up the parent PCI controller,
   and then the PCI controller can deal with it's children

Seems like this series is pretty close to 1 above except
we need to have a list of wakeirqs per device instead of
just one. And option 2 should already work as long as the
PCI device driver parses and configures the wakeirq.

> > Then in addition to that, we could do the following to allow
> > PCI devices to request the wakeirq from the PCI controller:
> >
> > 1. PCI controller or framework implements a chained irq for
> >the WAKE# lines assuming it can mask/unmask the WAKE# lines
> >
> > 2. PCI devices then can just request the wakeirq from the PCI
> >controller
> >
> > And that's about it. Optionally we could leave out the dependency
> > to having PCI devices implement PM runtime and just resume the
> > parent (PCI controller) if PCI devices has not implemented
> > PM runtime.
> 
> So if my understanding is correct, DT should give you the WAKE# IRQ
> for the given endpoint PCI device and you only are expected to request
> it.   The rest should just follow from the other pieces of information
> in the DT.

Yeah and it seems that we should allow configuring both cases
1 and 2 above.

> With the quite obvious caveat that the same IRQ may be used as WAKE#
> for multiple endpoint devices (which BTW need not be under the same
> bridge even).

And with the shared interrupts we can't do the masking/unmasking
automatically..

> >> > >> >+   if (irq == -EPROBE_DEFER)
> >> > > Braces here, please.
> >> > ok, will fix in the next version.
> >> >
> >> > >
> >> > >> >+   return irq;
> >> > >> >+   /* Ignore other errors, since a missing wakeup is non-fatal. 
> >> > >> >*/
> >> > >> >+   else if (irq < 0) {
> >> > >> >+   dev_info(>dev, "cannot get wakeup interrupt: 
> >> > >> >%d\n", irq);
> >> > >> >+   return 0;
> >> > >> >+   }
> >> > >> >+
> >> > >> >+   device_init_wakeup(>dev, true);
> >> > > Why do you call this before dev_pm_set_dedicated_wake_irq()?
> >> >
> >> > hmmm, i thought so too, but it turns out the dedicated wake irq
> >> > framework requires device_init_wakeup(dev, true) before attach the wake 
> >> > irq:
> >> >
> >> > int device_wakeup_attach_irq(struct device *dev,
> >> >   struct wake_irq *wakeirq)
> >> > {
> >> >  struct wakeup_source *ws;
> >> >
> >> >  ws = dev->power.wakeup;
> >> >  if (!ws) {
> >> >  dev_err(dev, "forgot to call device_init_wakeup?\n");
> >> >  return -EINVAL;
> >> >
> >>
> >> Well, that's a framework issue, fair enough.
> >>
> >> That said, what if user space removes the wakeup source from under you
> >> 

Re: [RFC PATCH v11 4/5] PCI / PM: Add support for the PCIe WAKE# signal for OF

2017-12-27 Thread Tony Lindgren
* Rafael J. Wysocki  [171228 00:51]:
> On Wed, Dec 27, 2017 at 4:08 PM, Tony Lindgren  wrote:
> > * Rafael J. Wysocki  [171227 01:00]:
> >> On Tuesday, December 26, 2017 2:06:47 AM CET JeffyChen wrote:
> >> > Hi Rafael,
> >> >
> >> > Thanks for your reply :)
> >> >
> >> > On 12/26/2017 08:11 AM, Rafael J. Wysocki wrote:
> >> > >> >+
> >> > >> >+   dn = pci_device_to_OF_node(ppdev);
> >> > >> >+   if (!dn)
> >> > >> >+   return 0;
> >> > >> >+
> >> > >> >+   irq = of_irq_get_byname(dn, "wakeup");
> >> > > Why is this a property of the bridge and not of the device itself?
> >> >
> >> > That is suggested by Brian, because in that way, the wakeup pin would
> >> > not "tied to what exact device is installed (or no device, if it's a 
> >> > slot)."
> >>
> >> But I don't think it works when there are two devices using different WAKE#
> >> interrupt lines under the same bridge.  Or how does it work then?
> >
> > It won't work currently for multiple devices but adding more than
> > one wakeriq per device is doable. And I think we will have other
> > cases where multiple wakeirqs are connected to a single device, so
> > that issue should be sorted out sooner or later.
> >
> > And if requesting wakeirq for the PCI WAKE# lines at the PCI
> > controller does the job, then maybe that's all we need to start with.
> 
> These are expected to be out-of-band, so not having anything to do
> with the Root Complex.
> 
> In-band PME Messages go through the PCIe hierarchy, but that is a
> standard mechanism and it is supported already.
> 
> WAKE# are platform-specific, pretty much by definition and I guess
> that on most ARM boards they are just going to be some kind of GPIO
> pins.

OK. So probably supporting the following two configurations
should be enough then:

1. One or more WAKE# lines configured as a wakeirq for the PCI
   controller

   When the wakeirq calls pm_wakeup_event() for the PCI controller
   device driver, the PCI controller wakes up and can deal with
   it's child devices

2. Optionally a WAKE# line from a PCI device configured as wakeirq
   for the PCI device driver

   In this case calling the PM runtime resume in the child
   PCI device will also wake up the parent PCI controller,
   and then the PCI controller can deal with it's children

Seems like this series is pretty close to 1 above except
we need to have a list of wakeirqs per device instead of
just one. And option 2 should already work as long as the
PCI device driver parses and configures the wakeirq.

> > Then in addition to that, we could do the following to allow
> > PCI devices to request the wakeirq from the PCI controller:
> >
> > 1. PCI controller or framework implements a chained irq for
> >the WAKE# lines assuming it can mask/unmask the WAKE# lines
> >
> > 2. PCI devices then can just request the wakeirq from the PCI
> >controller
> >
> > And that's about it. Optionally we could leave out the dependency
> > to having PCI devices implement PM runtime and just resume the
> > parent (PCI controller) if PCI devices has not implemented
> > PM runtime.
> 
> So if my understanding is correct, DT should give you the WAKE# IRQ
> for the given endpoint PCI device and you only are expected to request
> it.   The rest should just follow from the other pieces of information
> in the DT.

Yeah and it seems that we should allow configuring both cases
1 and 2 above.

> With the quite obvious caveat that the same IRQ may be used as WAKE#
> for multiple endpoint devices (which BTW need not be under the same
> bridge even).

And with the shared interrupts we can't do the masking/unmasking
automatically..

> >> > >> >+   if (irq == -EPROBE_DEFER)
> >> > > Braces here, please.
> >> > ok, will fix in the next version.
> >> >
> >> > >
> >> > >> >+   return irq;
> >> > >> >+   /* Ignore other errors, since a missing wakeup is non-fatal. 
> >> > >> >*/
> >> > >> >+   else if (irq < 0) {
> >> > >> >+   dev_info(>dev, "cannot get wakeup interrupt: 
> >> > >> >%d\n", irq);
> >> > >> >+   return 0;
> >> > >> >+   }
> >> > >> >+
> >> > >> >+   device_init_wakeup(>dev, true);
> >> > > Why do you call this before dev_pm_set_dedicated_wake_irq()?
> >> >
> >> > hmmm, i thought so too, but it turns out the dedicated wake irq
> >> > framework requires device_init_wakeup(dev, true) before attach the wake 
> >> > irq:
> >> >
> >> > int device_wakeup_attach_irq(struct device *dev,
> >> >   struct wake_irq *wakeirq)
> >> > {
> >> >  struct wakeup_source *ws;
> >> >
> >> >  ws = dev->power.wakeup;
> >> >  if (!ws) {
> >> >  dev_err(dev, "forgot to call device_init_wakeup?\n");
> >> >  return -EINVAL;
> >> >
> >>
> >> Well, that's a framework issue, fair enough.
> >>
> >> That said, what if user space removes the wakeup source from under you
> >> concurrently via sysfs?  Tony?
> >
> > Hmm sounds racy, need to 

Re: [PATCH] Documentation: gpu/i915: rename intel_guc_loader.c

2017-12-27 Thread Randy Dunlap
On 12/27/2017 07:28 PM, Joey Pabalinas wrote:
> Commit e8668bbcb0f91c7baa ("drm/i915/guc: Rename intel_guc_loader.c to
> intel_guc_fw.c") renamed drivers/gpu/drm/i915/intel_guc_loader.c
> but didn't update Documentation/gpu/i915.rst. Change intel_guc_loader.c
> to intel_guc_fw.c in Documentation/gpu/i915.rst.
> 
> Signed-off-by: Joey Pabalinas 

https://marc.info/?l=dri-devel=151237442131113=2

Already applied to drm/ tree.

> ---
>  Documentation/gpu/i915.rst | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
> index 2e7ee0313c1cd6c037..e21698e16534de9ccc 100644
> --- a/Documentation/gpu/i915.rst
> +++ b/Documentation/gpu/i915.rst
> @@ -341,10 +341,10 @@ GuC
>  GuC-specific firmware loader
>  
>  
> -.. kernel-doc:: drivers/gpu/drm/i915/intel_guc_loader.c
> +.. kernel-doc:: drivers/gpu/drm/i915/intel_guc_fw.c
> :doc: GuC-specific firmware loader
>  
> -.. kernel-doc:: drivers/gpu/drm/i915/intel_guc_loader.c
> +.. kernel-doc:: drivers/gpu/drm/i915/intel_guc_fw.c
> :internal:
>  
>  GuC-based command submission
> 


-- 
~Randy


Re: [PATCH v3 bpf-next 2/2] tools/bpftool: fix bpftool build with bintutils >= 2.9

2017-12-27 Thread Jakub Kicinski
On Wed, 27 Dec 2017 19:16:29 +, Roman Gushchin wrote:
> Bpftool build is broken with binutils version 2.29 and later.
> The cause is commit 003ca0fd2286 ("Refactor disassembler selection")
> in the binutils repo, which changed the disassembler() function
> signature.
> 
> Fix this by adding a new "feature" to the tools/build/features
> infrastructure and make it responsible for decision which
> disassembler() function signature to use.
> 
> Signed-off-by: Roman Gushchin 
> Cc: Jakub Kicinski 
> Cc: Alexei Starovoitov 
> Cc: Daniel Borkmann 

Acked-by: Jakub Kicinski 


Re: [PATCH] Documentation: gpu/i915: rename intel_guc_loader.c

2017-12-27 Thread Randy Dunlap
On 12/27/2017 07:28 PM, Joey Pabalinas wrote:
> Commit e8668bbcb0f91c7baa ("drm/i915/guc: Rename intel_guc_loader.c to
> intel_guc_fw.c") renamed drivers/gpu/drm/i915/intel_guc_loader.c
> but didn't update Documentation/gpu/i915.rst. Change intel_guc_loader.c
> to intel_guc_fw.c in Documentation/gpu/i915.rst.
> 
> Signed-off-by: Joey Pabalinas 

https://marc.info/?l=dri-devel=151237442131113=2

Already applied to drm/ tree.

> ---
>  Documentation/gpu/i915.rst | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
> index 2e7ee0313c1cd6c037..e21698e16534de9ccc 100644
> --- a/Documentation/gpu/i915.rst
> +++ b/Documentation/gpu/i915.rst
> @@ -341,10 +341,10 @@ GuC
>  GuC-specific firmware loader
>  
>  
> -.. kernel-doc:: drivers/gpu/drm/i915/intel_guc_loader.c
> +.. kernel-doc:: drivers/gpu/drm/i915/intel_guc_fw.c
> :doc: GuC-specific firmware loader
>  
> -.. kernel-doc:: drivers/gpu/drm/i915/intel_guc_loader.c
> +.. kernel-doc:: drivers/gpu/drm/i915/intel_guc_fw.c
> :internal:
>  
>  GuC-based command submission
> 


-- 
~Randy


Re: [PATCH v3 bpf-next 2/2] tools/bpftool: fix bpftool build with bintutils >= 2.9

2017-12-27 Thread Jakub Kicinski
On Wed, 27 Dec 2017 19:16:29 +, Roman Gushchin wrote:
> Bpftool build is broken with binutils version 2.29 and later.
> The cause is commit 003ca0fd2286 ("Refactor disassembler selection")
> in the binutils repo, which changed the disassembler() function
> signature.
> 
> Fix this by adding a new "feature" to the tools/build/features
> infrastructure and make it responsible for decision which
> disassembler() function signature to use.
> 
> Signed-off-by: Roman Gushchin 
> Cc: Jakub Kicinski 
> Cc: Alexei Starovoitov 
> Cc: Daniel Borkmann 

Acked-by: Jakub Kicinski 


Re: [RFC PATCH bpf-next v2 1/4] tracing/kprobe: bpf: Check error injectable event is on function entry

2017-12-27 Thread Steven Rostedt
On Wed, 27 Dec 2017 19:45:42 -0800
Alexei Starovoitov  wrote:

> I don't think that's the case. My reading of current
> trace_kprobe_ftrace() -> arch_check_ftrace_location()
> is that it will not be true for old mcount case.

In the old mcount case, you can't use ftrace to return without calling
the function. That is, no modification of the return ip, unless you
created a trampoline that could handle arbitrary stack frames, and
remove them from the stack before returning back to the function.

> 
> As far as the rest of your arguments it very much puzzles me that
> you claim that this patch suppose to work based on historical
> reasoning whereas you did NOT test it.

I believe that Masami is saying that the modification of the IP from
kprobes has been very well tested. But I'm guessing that you still want
a test case for using kprobes in this particular instance. It's not the
implementation of modifying the IP that you are worried about, but the
implementation of BPF using it in this case. Right?

-- Steve


Re: [RFC PATCH bpf-next v2 1/4] tracing/kprobe: bpf: Check error injectable event is on function entry

2017-12-27 Thread Steven Rostedt
On Wed, 27 Dec 2017 19:45:42 -0800
Alexei Starovoitov  wrote:

> I don't think that's the case. My reading of current
> trace_kprobe_ftrace() -> arch_check_ftrace_location()
> is that it will not be true for old mcount case.

In the old mcount case, you can't use ftrace to return without calling
the function. That is, no modification of the return ip, unless you
created a trampoline that could handle arbitrary stack frames, and
remove them from the stack before returning back to the function.

> 
> As far as the rest of your arguments it very much puzzles me that
> you claim that this patch suppose to work based on historical
> reasoning whereas you did NOT test it.

I believe that Masami is saying that the modification of the IP from
kprobes has been very well tested. But I'm guessing that you still want
a test case for using kprobes in this particular instance. It's not the
implementation of modifying the IP that you are worried about, but the
implementation of BPF using it in this case. Right?

-- Steve


Re: [Ocfs2-devel] [PATCH] ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGE

2017-12-27 Thread Eric Ren

Hi,


On 12/27/2017 05:29 PM, Gang He wrote:

If we can't get inode lock immediately in the function
ocfs2_inode_lock_with_page() when reading a page, we should not
return directly here, since this will lead to a softlockup problem.
The method is to get a blocking lock and immediately unlock before
returning, this can avoid CPU resource waste due to lots of retries,
and benefits fairness in getting lock among multiple nodes, increase
efficiency in case modifying the same file frequently from multiple
nodes.
The softlockup problem looks like,
Kernel panic - not syncing: softlockup: hung tasks
CPU: 0 PID: 885 Comm: multi_mmap Tainted: G L 4.12.14-6.1-default #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
   
   dump_stack+0x5c/0x82
   panic+0xd5/0x21e
   watchdog_timer_fn+0x208/0x210
   ? watchdog_park_threads+0x70/0x70
   __hrtimer_run_queues+0xcc/0x200
   hrtimer_interrupt+0xa6/0x1f0
   smp_apic_timer_interrupt+0x34/0x50
   apic_timer_interrupt+0x96/0xa0
   
  RIP: 0010:unlock_page+0x17/0x30
  RSP: :af154080bc88 EFLAGS: 0246 ORIG_RAX: ff10
  RAX: dead0100 RBX: f21e009f5300 RCX: 0004
  RDX: dead00ff RSI: 0202 RDI: f21e009f5300
  RBP:  R08:  R09: af154080bb00
  R10: af154080bc30 R11: 0040 R12: 993749a39518
  R13:  R14: f21e009f5300 R15: f21e009f5300
   ocfs2_inode_lock_with_page+0x25/0x30 [ocfs2]
   ocfs2_readpage+0x41/0x2d0 [ocfs2]
   ? pagecache_get_page+0x30/0x200
   filemap_fault+0x12b/0x5c0
   ? recalc_sigpending+0x17/0x50
   ? __set_task_blocked+0x28/0x70
   ? __set_current_blocked+0x3d/0x60
   ocfs2_fault+0x29/0xb0 [ocfs2]
   __do_fault+0x1a/0xa0
   __handle_mm_fault+0xbe8/0x1090
   handle_mm_fault+0xaa/0x1f0
   __do_page_fault+0x235/0x4b0
   trace_do_page_fault+0x3c/0x110
   async_page_fault+0x28/0x30
  RIP: 0033:0x7fa75ded638e
  RSP: 002b:7ffd6657db18 EFLAGS: 00010287
  RAX: 55c7662fb700 RBX: 0001 RCX: 55c7662fb700
  RDX: 1770 RSI: 7fa75e909000 RDI: 55c7662fb700
  RBP: 0003 R08: 000e R09: 
  R10: 0483 R11: 7fa75ded61b0 R12: 7fa75e90a770
  R13: 000e R14: 1770 R15: 

Fixes: 1cce4df04f37 ("ocfs2: do not lock/unlock() inode DLM lock")
Signed-off-by: Gang He 


On most linux server, CONFIG_PREEMPT is not set for better system-wide 
throughtput.
The long-time retry logic for getting page lock and inode lock can 
easily cause softlock,
resulting in real time task like corosync when using pcmk stack cannot 
be scheduled

on time.

When multiple nodes concurrently write the same file, the performance 
cannot be good

anyway, and it's also less possibility.

The trick for avoiding the busy loop looks good to me.

Reviewed-by: z...@suse.com

Thanks,
Eric


---
  fs/ocfs2/dlmglue.c | 9 +
  1 file changed, 9 insertions(+)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 4689940..5193218 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -2486,6 +2486,15 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
ret = ocfs2_inode_lock_full(inode, ret_bh, ex, OCFS2_LOCK_NONBLOCK);
if (ret == -EAGAIN) {
unlock_page(page);
+   /*
+* If we can't get inode lock immediately, we should not return
+* directly here, since this will lead to a softlockup problem.
+* The method is to get a blocking lock and immediately unlock
+* before returning, this can avoid CPU resource waste due to
+* lots of retries, and benefits fairness in getting lock.
+*/
+   if (ocfs2_inode_lock(inode, ret_bh, ex) == 0)
+   ocfs2_inode_unlock(inode, ex);
ret = AOP_TRUNCATED_PAGE;
}
  




Re: [Ocfs2-devel] [PATCH] ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGE

2017-12-27 Thread Eric Ren

Hi,


On 12/27/2017 05:29 PM, Gang He wrote:

If we can't get inode lock immediately in the function
ocfs2_inode_lock_with_page() when reading a page, we should not
return directly here, since this will lead to a softlockup problem.
The method is to get a blocking lock and immediately unlock before
returning, this can avoid CPU resource waste due to lots of retries,
and benefits fairness in getting lock among multiple nodes, increase
efficiency in case modifying the same file frequently from multiple
nodes.
The softlockup problem looks like,
Kernel panic - not syncing: softlockup: hung tasks
CPU: 0 PID: 885 Comm: multi_mmap Tainted: G L 4.12.14-6.1-default #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
   
   dump_stack+0x5c/0x82
   panic+0xd5/0x21e
   watchdog_timer_fn+0x208/0x210
   ? watchdog_park_threads+0x70/0x70
   __hrtimer_run_queues+0xcc/0x200
   hrtimer_interrupt+0xa6/0x1f0
   smp_apic_timer_interrupt+0x34/0x50
   apic_timer_interrupt+0x96/0xa0
   
  RIP: 0010:unlock_page+0x17/0x30
  RSP: :af154080bc88 EFLAGS: 0246 ORIG_RAX: ff10
  RAX: dead0100 RBX: f21e009f5300 RCX: 0004
  RDX: dead00ff RSI: 0202 RDI: f21e009f5300
  RBP:  R08:  R09: af154080bb00
  R10: af154080bc30 R11: 0040 R12: 993749a39518
  R13:  R14: f21e009f5300 R15: f21e009f5300
   ocfs2_inode_lock_with_page+0x25/0x30 [ocfs2]
   ocfs2_readpage+0x41/0x2d0 [ocfs2]
   ? pagecache_get_page+0x30/0x200
   filemap_fault+0x12b/0x5c0
   ? recalc_sigpending+0x17/0x50
   ? __set_task_blocked+0x28/0x70
   ? __set_current_blocked+0x3d/0x60
   ocfs2_fault+0x29/0xb0 [ocfs2]
   __do_fault+0x1a/0xa0
   __handle_mm_fault+0xbe8/0x1090
   handle_mm_fault+0xaa/0x1f0
   __do_page_fault+0x235/0x4b0
   trace_do_page_fault+0x3c/0x110
   async_page_fault+0x28/0x30
  RIP: 0033:0x7fa75ded638e
  RSP: 002b:7ffd6657db18 EFLAGS: 00010287
  RAX: 55c7662fb700 RBX: 0001 RCX: 55c7662fb700
  RDX: 1770 RSI: 7fa75e909000 RDI: 55c7662fb700
  RBP: 0003 R08: 000e R09: 
  R10: 0483 R11: 7fa75ded61b0 R12: 7fa75e90a770
  R13: 000e R14: 1770 R15: 

Fixes: 1cce4df04f37 ("ocfs2: do not lock/unlock() inode DLM lock")
Signed-off-by: Gang He 


On most linux server, CONFIG_PREEMPT is not set for better system-wide 
throughtput.
The long-time retry logic for getting page lock and inode lock can 
easily cause softlock,
resulting in real time task like corosync when using pcmk stack cannot 
be scheduled

on time.

When multiple nodes concurrently write the same file, the performance 
cannot be good

anyway, and it's also less possibility.

The trick for avoiding the busy loop looks good to me.

Reviewed-by: z...@suse.com

Thanks,
Eric


---
  fs/ocfs2/dlmglue.c | 9 +
  1 file changed, 9 insertions(+)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 4689940..5193218 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -2486,6 +2486,15 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
ret = ocfs2_inode_lock_full(inode, ret_bh, ex, OCFS2_LOCK_NONBLOCK);
if (ret == -EAGAIN) {
unlock_page(page);
+   /*
+* If we can't get inode lock immediately, we should not return
+* directly here, since this will lead to a softlockup problem.
+* The method is to get a blocking lock and immediately unlock
+* before returning, this can avoid CPU resource waste due to
+* lots of retries, and benefits fairness in getting lock.
+*/
+   if (ocfs2_inode_lock(inode, ret_bh, ex) == 0)
+   ocfs2_inode_unlock(inode, ex);
ret = AOP_TRUNCATED_PAGE;
}
  




Re: [RFC PATCH bpf-next v2 4/4] error-injection: Support fault injection framework

2017-12-27 Thread Alexei Starovoitov

On 12/27/17 5:38 PM, Masami Hiramatsu wrote:

On Wed, 27 Dec 2017 14:49:46 -0800
Alexei Starovoitov  wrote:


On 12/27/17 12:09 AM, Masami Hiramatsu wrote:

On Tue, 26 Dec 2017 18:12:56 -0800
Alexei Starovoitov  wrote:


On Tue, Dec 26, 2017 at 04:48:25PM +0900, Masami Hiramatsu wrote:

Support in-kernel fault-injection framework via debugfs.
This allows you to inject a conditional error to specified
function using debugfs interfaces.

Signed-off-by: Masami Hiramatsu 
---
 Documentation/fault-injection/fault-injection.txt |5 +
 kernel/Makefile   |1
 kernel/fail_function.c|  169 +
 lib/Kconfig.debug |   10 +
 4 files changed, 185 insertions(+)
 create mode 100644 kernel/fail_function.c

diff --git a/Documentation/fault-injection/fault-injection.txt 
b/Documentation/fault-injection/fault-injection.txt
index 918972babcd8..6243a588dd71 100644
--- a/Documentation/fault-injection/fault-injection.txt
+++ b/Documentation/fault-injection/fault-injection.txt
@@ -30,6 +30,11 @@ o fail_mmc_request
   injects MMC data errors on devices permitted by setting
   debugfs entries under /sys/kernel/debug/mmc0/fail_mmc_request

+o fail_function
+
+  injects error return on specific functions by setting debugfs entries
+  under /sys/kernel/debug/fail_function. No boot option supported.


I like it.
Could you document it a bit better?


Yes, I will do in next series.


In particular retval is configurable, but without an example no one
will be able to figure out how to use it.


Ah, right. BTW, as I pointed in the covermail, should we store the
expected error value range into the injectable list? e.g.

ALLOW_ERROR_INJECTION(open_ctree, -1, -MAX_ERRNO)

And provide APIs to check/get it.


I'm afraid such check would be too costly.
Right now we have only two functions marked but I expect hundreds more
will be added in the near future as soon as developers realize the
potential of such error injection.
All of ALLOW_ERROR_INJECTION marks add 8 byte overhead each to .data.
Multiple by 1k and we have 8k of data spent on marks.
If we add max/min range marks that doubles it for very little use.
I think marking function only is enough.


Sorry, I don't think so.
Even if it takes 16 bytes more for each points, I don't think it is
any overhead for machines in these days. Even if so, we can provide
a kconfig to reduce it.
I mean, we are living in GB-order memory are, and it will be bigger
in the future. Why we have to worry about hundreds of 16bytes memory
pieces? It will take a few KB, and even if we mark thousands of
functions, it never reaches 1MB, in GB memory pool. :)

Of course, for many small-footprint embedded devices (like having
less than 128MB memory), this feature can be a overhead. But they
can cut off the table by kconfig.


I still disagree on wasting 16-byte * num_of_funcs of .data here.
The trade-off of usability vs memory just not worth it. Sorry.
Please focus on testing your changes instead.




Re: [RFC PATCH bpf-next v2 4/4] error-injection: Support fault injection framework

2017-12-27 Thread Alexei Starovoitov

On 12/27/17 5:38 PM, Masami Hiramatsu wrote:

On Wed, 27 Dec 2017 14:49:46 -0800
Alexei Starovoitov  wrote:


On 12/27/17 12:09 AM, Masami Hiramatsu wrote:

On Tue, 26 Dec 2017 18:12:56 -0800
Alexei Starovoitov  wrote:


On Tue, Dec 26, 2017 at 04:48:25PM +0900, Masami Hiramatsu wrote:

Support in-kernel fault-injection framework via debugfs.
This allows you to inject a conditional error to specified
function using debugfs interfaces.

Signed-off-by: Masami Hiramatsu 
---
 Documentation/fault-injection/fault-injection.txt |5 +
 kernel/Makefile   |1
 kernel/fail_function.c|  169 +
 lib/Kconfig.debug |   10 +
 4 files changed, 185 insertions(+)
 create mode 100644 kernel/fail_function.c

diff --git a/Documentation/fault-injection/fault-injection.txt 
b/Documentation/fault-injection/fault-injection.txt
index 918972babcd8..6243a588dd71 100644
--- a/Documentation/fault-injection/fault-injection.txt
+++ b/Documentation/fault-injection/fault-injection.txt
@@ -30,6 +30,11 @@ o fail_mmc_request
   injects MMC data errors on devices permitted by setting
   debugfs entries under /sys/kernel/debug/mmc0/fail_mmc_request

+o fail_function
+
+  injects error return on specific functions by setting debugfs entries
+  under /sys/kernel/debug/fail_function. No boot option supported.


I like it.
Could you document it a bit better?


Yes, I will do in next series.


In particular retval is configurable, but without an example no one
will be able to figure out how to use it.


Ah, right. BTW, as I pointed in the covermail, should we store the
expected error value range into the injectable list? e.g.

ALLOW_ERROR_INJECTION(open_ctree, -1, -MAX_ERRNO)

And provide APIs to check/get it.


I'm afraid such check would be too costly.
Right now we have only two functions marked but I expect hundreds more
will be added in the near future as soon as developers realize the
potential of such error injection.
All of ALLOW_ERROR_INJECTION marks add 8 byte overhead each to .data.
Multiple by 1k and we have 8k of data spent on marks.
If we add max/min range marks that doubles it for very little use.
I think marking function only is enough.


Sorry, I don't think so.
Even if it takes 16 bytes more for each points, I don't think it is
any overhead for machines in these days. Even if so, we can provide
a kconfig to reduce it.
I mean, we are living in GB-order memory are, and it will be bigger
in the future. Why we have to worry about hundreds of 16bytes memory
pieces? It will take a few KB, and even if we mark thousands of
functions, it never reaches 1MB, in GB memory pool. :)

Of course, for many small-footprint embedded devices (like having
less than 128MB memory), this feature can be a overhead. But they
can cut off the table by kconfig.


I still disagree on wasting 16-byte * num_of_funcs of .data here.
The trade-off of usability vs memory just not worth it. Sorry.
Please focus on testing your changes instead.




Re: [RFC PATCH bpf-next v2 1/4] tracing/kprobe: bpf: Check error injectable event is on function entry

2017-12-27 Thread Alexei Starovoitov

On 12/27/17 6:34 PM, Masami Hiramatsu wrote:

On Wed, 27 Dec 2017 14:46:24 -0800
Alexei Starovoitov  wrote:


On 12/26/17 9:56 PM, Masami Hiramatsu wrote:

On Tue, 26 Dec 2017 17:57:32 -0800
Alexei Starovoitov  wrote:


On Tue, Dec 26, 2017 at 04:46:59PM +0900, Masami Hiramatsu wrote:

Check whether error injectable event is on function entry or not.
Currently it checks the event is ftrace-based kprobes or not,
but that is wrong. It should check if the event is on the entry
of target function. Since error injection will override a function
to just return with modified return value, that operation must
be done before the target function starts making stackframe.

As a side effect, bpf error injection is no need to depend on
function-tracer. It can work with sw-breakpoint based kprobe
events too.

Signed-off-by: Masami Hiramatsu 
---
 kernel/trace/Kconfig|2 --
 kernel/trace/bpf_trace.c|6 +++---
 kernel/trace/trace_kprobe.c |8 +---
 kernel/trace/trace_probe.h  |   12 ++--
 4 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index ae3a2d519e50..6400e1bf97c5 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -533,9 +533,7 @@ config FUNCTION_PROFILER
 config BPF_KPROBE_OVERRIDE
bool "Enable BPF programs to override a kprobed function"
depends on BPF_EVENTS
-   depends on KPROBES_ON_FTRACE
depends on HAVE_KPROBE_OVERRIDE
-   depends on DYNAMIC_FTRACE_WITH_REGS
default n
help
 Allows BPF to override the execution of a probed function and
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index f6d2327ecb59..d663660f8392 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -800,11 +800,11 @@ int perf_event_attach_bpf_prog(struct perf_event *event,
int ret = -EEXIST;

/*
-* Kprobe override only works for ftrace based kprobes, and only if they
-* are on the opt-in list.
+* Kprobe override only works if they are on the function entry,
+* and only if they are on the opt-in list.
 */
if (prog->kprobe_override &&
-   (!trace_kprobe_ftrace(event->tp_event) ||
+   (!trace_kprobe_on_func_entry(event->tp_event) ||
 !trace_kprobe_error_injectable(event->tp_event)))
return -EINVAL;

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 91f4b57dab82..265e3e27e8dc 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -88,13 +88,15 @@ static nokprobe_inline unsigned long 
trace_kprobe_nhit(struct trace_kprobe *tk)
return nhit;
 }

-int trace_kprobe_ftrace(struct trace_event_call *call)
+bool trace_kprobe_on_func_entry(struct trace_event_call *call)
 {
struct trace_kprobe *tk = (struct trace_kprobe *)call->data;
-   return kprobe_ftrace(>rp.kp);
+
+   return kprobe_on_func_entry(tk->rp.kp.addr, tk->rp.kp.symbol_name,
+   tk->rp.kp.offset);


That would be nice, but did you test this?


Yes, because the jprobe, which was only official user of modifying execution
path using kprobe, did same way to check. (and kretprobe also does it)


My understanding that kprobe will restore all regs and
here we need to override return ip _and_ value.


yes, no problem. kprobe restore all regs from pt_regs, including regs->ip.


Could you add a patch with the test the way Josef did
or describe the steps to test this new mode?


Would you mean below patch? If so, it should work without any change.

 [PATCH v10 4/5] samples/bpf: add a test for bpf_override_return


yeah. I expect bpf_override_return test to work as-is.
I'm asking for the test for new functionality added by this patch.
In particular kprobe on func entry without ftrace.
How did you test it?


This function is used in kretprobe and jprobe. Jprobe was the user of
"modifying instruction pointer to another function" in kprobes.
If it doesn't work, jprobe also doesn't work, this means you can not
modify IP by kprobes anymore.
Anyway, until linux-4.13, that was well tested by kprobe smoke test.


and how I can repeat the test?
I'm still not sure that it works correctly.


That works correctly because it checks given address is on the entry
point (the 1st instruction) of a function, using kallsyms.

The reason why I made another flag for ftrace was, there are 2 modes
for ftrace dynamic instrumentation, fentry and mcount.
With new fentry mode, ftrace will be put on the first instruction
of the function, so it will work as you expected.
With traditional gcc mcount, ftrace will be called after making call
frame for _mcount(). This means if you modify ip, it will not work
or cause a trouble because _mcount call frame is still on the stack.

So, current ftrace-based checker doesn't work, it depends on the case.
Of course, in most 

Re: [RFC PATCH bpf-next v2 1/4] tracing/kprobe: bpf: Check error injectable event is on function entry

2017-12-27 Thread Alexei Starovoitov

On 12/27/17 6:34 PM, Masami Hiramatsu wrote:

On Wed, 27 Dec 2017 14:46:24 -0800
Alexei Starovoitov  wrote:


On 12/26/17 9:56 PM, Masami Hiramatsu wrote:

On Tue, 26 Dec 2017 17:57:32 -0800
Alexei Starovoitov  wrote:


On Tue, Dec 26, 2017 at 04:46:59PM +0900, Masami Hiramatsu wrote:

Check whether error injectable event is on function entry or not.
Currently it checks the event is ftrace-based kprobes or not,
but that is wrong. It should check if the event is on the entry
of target function. Since error injection will override a function
to just return with modified return value, that operation must
be done before the target function starts making stackframe.

As a side effect, bpf error injection is no need to depend on
function-tracer. It can work with sw-breakpoint based kprobe
events too.

Signed-off-by: Masami Hiramatsu 
---
 kernel/trace/Kconfig|2 --
 kernel/trace/bpf_trace.c|6 +++---
 kernel/trace/trace_kprobe.c |8 +---
 kernel/trace/trace_probe.h  |   12 ++--
 4 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index ae3a2d519e50..6400e1bf97c5 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -533,9 +533,7 @@ config FUNCTION_PROFILER
 config BPF_KPROBE_OVERRIDE
bool "Enable BPF programs to override a kprobed function"
depends on BPF_EVENTS
-   depends on KPROBES_ON_FTRACE
depends on HAVE_KPROBE_OVERRIDE
-   depends on DYNAMIC_FTRACE_WITH_REGS
default n
help
 Allows BPF to override the execution of a probed function and
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index f6d2327ecb59..d663660f8392 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -800,11 +800,11 @@ int perf_event_attach_bpf_prog(struct perf_event *event,
int ret = -EEXIST;

/*
-* Kprobe override only works for ftrace based kprobes, and only if they
-* are on the opt-in list.
+* Kprobe override only works if they are on the function entry,
+* and only if they are on the opt-in list.
 */
if (prog->kprobe_override &&
-   (!trace_kprobe_ftrace(event->tp_event) ||
+   (!trace_kprobe_on_func_entry(event->tp_event) ||
 !trace_kprobe_error_injectable(event->tp_event)))
return -EINVAL;

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 91f4b57dab82..265e3e27e8dc 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -88,13 +88,15 @@ static nokprobe_inline unsigned long 
trace_kprobe_nhit(struct trace_kprobe *tk)
return nhit;
 }

-int trace_kprobe_ftrace(struct trace_event_call *call)
+bool trace_kprobe_on_func_entry(struct trace_event_call *call)
 {
struct trace_kprobe *tk = (struct trace_kprobe *)call->data;
-   return kprobe_ftrace(>rp.kp);
+
+   return kprobe_on_func_entry(tk->rp.kp.addr, tk->rp.kp.symbol_name,
+   tk->rp.kp.offset);


That would be nice, but did you test this?


Yes, because the jprobe, which was only official user of modifying execution
path using kprobe, did same way to check. (and kretprobe also does it)


My understanding that kprobe will restore all regs and
here we need to override return ip _and_ value.


yes, no problem. kprobe restore all regs from pt_regs, including regs->ip.


Could you add a patch with the test the way Josef did
or describe the steps to test this new mode?


Would you mean below patch? If so, it should work without any change.

 [PATCH v10 4/5] samples/bpf: add a test for bpf_override_return


yeah. I expect bpf_override_return test to work as-is.
I'm asking for the test for new functionality added by this patch.
In particular kprobe on func entry without ftrace.
How did you test it?


This function is used in kretprobe and jprobe. Jprobe was the user of
"modifying instruction pointer to another function" in kprobes.
If it doesn't work, jprobe also doesn't work, this means you can not
modify IP by kprobes anymore.
Anyway, until linux-4.13, that was well tested by kprobe smoke test.


and how I can repeat the test?
I'm still not sure that it works correctly.


That works correctly because it checks given address is on the entry
point (the 1st instruction) of a function, using kallsyms.

The reason why I made another flag for ftrace was, there are 2 modes
for ftrace dynamic instrumentation, fentry and mcount.
With new fentry mode, ftrace will be put on the first instruction
of the function, so it will work as you expected.
With traditional gcc mcount, ftrace will be called after making call
frame for _mcount(). This means if you modify ip, it will not work
or cause a trouble because _mcount call frame is still on the stack.

So, current ftrace-based checker doesn't work, it depends on the case.
Of course, in most case, kernel will be build in new gcc which
supports fentry, but 

Re: [PATCH] objtool: Fix clang enum conversion warning

2017-12-27 Thread Nick Desaulniers
On Wed, Dec 27, 2017 at 12:38 PM, Josh Poimboeuf  wrote:
> On Wed, Dec 27, 2017 at 01:34:34PM +0100, Lukas Bulwahn wrote:
>> Assuming that the authorship of this one-line change does not matter, as it
>> is largely suggested by the clang compiler anyway, and we want to move the
>> change forward, we should decide on which of three patches to move
>> forward. I can give my Reviewed-by and Tested-by to any of them.

I suppose Ingo would take the first and accumulate Reviewed-By tags.
I don't particularly care about authorship (please just fix the bug).

> The patch from Lukas was the first one I received, so that's the one I
> used.  I rewrote the commit msg for clarity and added my SOB and sent it
> to Ingo for merging.

I think you should have kept Nicholas Mc Guire's Reviewed by tag?
Maybe Ingo can re-add his and mine when merging?


Re: [PATCH] objtool: Fix clang enum conversion warning

2017-12-27 Thread Nick Desaulniers
On Wed, Dec 27, 2017 at 12:38 PM, Josh Poimboeuf  wrote:
> On Wed, Dec 27, 2017 at 01:34:34PM +0100, Lukas Bulwahn wrote:
>> Assuming that the authorship of this one-line change does not matter, as it
>> is largely suggested by the clang compiler anyway, and we want to move the
>> change forward, we should decide on which of three patches to move
>> forward. I can give my Reviewed-by and Tested-by to any of them.

I suppose Ingo would take the first and accumulate Reviewed-By tags.
I don't particularly care about authorship (please just fix the bug).

> The patch from Lukas was the first one I received, so that's the one I
> used.  I rewrote the commit msg for clarity and added my SOB and sent it
> to Ingo for merging.

I think you should have kept Nicholas Mc Guire's Reviewed by tag?
Maybe Ingo can re-add his and mine when merging?


[PATCH -next] xen/pvcalls: use GFP_ATOMIC under spin lock

2017-12-27 Thread Wei Yongjun
A spin lock is taken here so we should use GFP_ATOMIC.

Fixes: 9774c6cca266 ("xen/pvcalls: implement accept command")
Signed-off-by: Wei Yongjun 
---
 drivers/xen/pvcalls-front.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index 0c1ec68..dfd00d9 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -807,7 +807,7 @@ int pvcalls_front_accept(struct socket *sock, struct socket 
*newsock, int flags)
pvcalls_exit();
return ret;
}
-   map2 = kzalloc(sizeof(*map2), GFP_KERNEL);
+   map2 = kzalloc(sizeof(*map2), GFP_ATOMIC);
if (map2 == NULL) {
clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
  (void *)>passive.flags);



[PATCH -next] xen/pvcalls: use GFP_ATOMIC under spin lock

2017-12-27 Thread Wei Yongjun
A spin lock is taken here so we should use GFP_ATOMIC.

Fixes: 9774c6cca266 ("xen/pvcalls: implement accept command")
Signed-off-by: Wei Yongjun 
---
 drivers/xen/pvcalls-front.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index 0c1ec68..dfd00d9 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -807,7 +807,7 @@ int pvcalls_front_accept(struct socket *sock, struct socket 
*newsock, int flags)
pvcalls_exit();
return ret;
}
-   map2 = kzalloc(sizeof(*map2), GFP_KERNEL);
+   map2 = kzalloc(sizeof(*map2), GFP_ATOMIC);
if (map2 == NULL) {
clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
  (void *)>passive.flags);



  1   2   3   4   5   6   7   8   9   10   >