Re: [RFC PATCH bpf-next] bpf: change syscall_nr type to int in struct syscall_tp_t

2023-10-13 Thread Artem Savkov
On Thu, Oct 12, 2023 at 04:32:51PM -0700, Andrii Nakryiko wrote:
> On Thu, Oct 12, 2023 at 6:43 AM Steven Rostedt  wrote:
> >
> > On Thu, 12 Oct 2023 13:45:50 +0200
> > Artem Savkov  wrote:
> >
> > > linux-rt-devel tree contains a patch (b1773eac3f29c ("sched: Add support
> > > for lazy preemption")) that adds an extra member to struct trace_entry.
> > > This causes the offset of args field in struct trace_event_raw_sys_enter
> > > be different from the one in struct syscall_trace_enter:
> > >
> > > struct trace_event_raw_sys_enter {
> > > struct trace_entry ent;  /* 012 */
> > >
> > > /* XXX last struct has 3 bytes of padding */
> > > /* XXX 4 bytes hole, try to pack */
> > >
> > > long int   id;   /*16 8 */
> > > long unsigned int  args[6];  /*2448 */
> > > /* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
> > > char   __data[]; /*72 0 */
> > >
> > > /* size: 72, cachelines: 2, members: 4 */
> > > /* sum members: 68, holes: 1, sum holes: 4 */
> > > /* paddings: 1, sum paddings: 3 */
> > > /* last cacheline: 8 bytes */
> > > };
> > >
> > > struct syscall_trace_enter {
> > > struct trace_entry ent;  /* 012 */
> > >
> > > /* XXX last struct has 3 bytes of padding */
> > >
> > > intnr;   /*12 4 */
> > > long unsigned int  args[];   /*16 0 */
> > >
> > > /* size: 16, cachelines: 1, members: 3 */
> > > /* paddings: 1, sum paddings: 3 */
> > > /* last cacheline: 16 bytes */
> > > };
> > >
> > > This, in turn, causes perf_event_set_bpf_prog() fail while running bpf
> > > test_profiler testcase because max_ctx_offset is calculated based on the
> > > former struct, while off on the latter:
> > >
> > >   10488 if (is_tracepoint || is_syscall_tp) {
> > >   10489 int off = 
> > > trace_event_get_offsets(event->tp_event);
> > >   10490
> > >   10491 if (prog->aux->max_ctx_offset > off)
> > >   10492 return -EACCES;
> > >   10493 }
> > >
> > > What bpf program is actually getting is a pointer to struct
> > > syscall_tp_t, defined in kernel/trace/trace_syscalls.c. This patch fixes
> > > the problem by aligning struct syscall_tp_t with with struct
> > > syscall_trace_(enter|exit) and changing the tests to use these structs
> > > to dereference context.
> > >
> > > Signed-off-by: Artem Savkov 
> >
> 
> I think these changes make sense regardless, can you please resend the
> patch without RFC tag so that our CI can run tests for it?

Ok, didn't know it was set up like that.

> > Thanks for doing a proper fix.
> >
> > Acked-by: Steven Rostedt (Google) 
> 
> But looking at [0] and briefly reading some of the discussions you,
> Steven, had. I'm just wondering if it would be best to avoid
> increasing struct trace_entry altogether? It seems like preempt_count
> is actually a 4-bit field in trace context, so it doesn't seem like we
> really need to allocate an entire byte for both preempt_count and
> preempt_lazy_count. Why can't we just combine them and not waste 8
> extra bytes for each trace event in a ring buffer?
> 
>   [0] 
> https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/commit/?id=b1773eac3f29cbdcdfd16e0339f1a164066e9f71

I agree that avoiding increase in struct trace_entry size would be very
desirable, but I have no knowledge whether rt developers had reasons to
do it like this.

Nevertheless I think the issue with verifier running against a wrong
struct still needs to be addressed.

-- 
Regards,
  Artem




[PATCH bpf-next] bpf: change syscall_nr type to int in struct syscall_tp_t

2023-10-12 Thread Artem Savkov
linux-rt-devel tree contains a patch (b1773eac3f29c ("sched: Add support
for lazy preemption")) that adds an extra member to struct trace_entry.
This causes the offset of args field in struct trace_event_raw_sys_enter
be different from the one in struct syscall_trace_enter:

struct trace_event_raw_sys_enter {
struct trace_entry ent;  /* 012 */

/* XXX last struct has 3 bytes of padding */
/* XXX 4 bytes hole, try to pack */

long int   id;   /*16 8 */
long unsigned int  args[6];  /*2448 */
/* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
char   __data[]; /*72 0 */

/* size: 72, cachelines: 2, members: 4 */
/* sum members: 68, holes: 1, sum holes: 4 */
/* paddings: 1, sum paddings: 3 */
/* last cacheline: 8 bytes */
};

struct syscall_trace_enter {
struct trace_entry ent;  /* 012 */

/* XXX last struct has 3 bytes of padding */

intnr;   /*12 4 */
long unsigned int  args[];   /*16 0 */

/* size: 16, cachelines: 1, members: 3 */
/* paddings: 1, sum paddings: 3 */
/* last cacheline: 16 bytes */
};

This, in turn, causes perf_event_set_bpf_prog() fail while running bpf
test_profiler testcase because max_ctx_offset is calculated based on the
former struct, while off on the latter:

  10488 if (is_tracepoint || is_syscall_tp) {
  10489 int off = trace_event_get_offsets(event->tp_event);
  10490
  10491 if (prog->aux->max_ctx_offset > off)
  10492 return -EACCES;
  10493 }

What bpf program is actually getting is a pointer to struct
syscall_tp_t, defined in kernel/trace/trace_syscalls.c. This patch fixes
the problem by aligning struct syscall_tp_t with with struct
syscall_trace_(enter|exit) and changing the tests to use these structs
to dereference context.

Signed-off-by: Artem Savkov 
Acked-by: Steven Rostedt (Google) 

---
 kernel/trace/trace_syscalls.c| 4 ++--
 tools/testing/selftests/bpf/progs/profiler.inc.h | 2 +-
 tools/testing/selftests/bpf/progs/test_vmlinux.c | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index de753403cdafb..9c581d6da843a 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -556,7 +556,7 @@ static int perf_call_bpf_enter(struct trace_event_call 
*call, struct pt_regs *re
 {
struct syscall_tp_t {
struct trace_entry ent;
-   unsigned long syscall_nr;
+   int syscall_nr;
unsigned long args[SYSCALL_DEFINE_MAXARGS];
} __aligned(8) param;
int i;
@@ -661,7 +661,7 @@ static int perf_call_bpf_exit(struct trace_event_call 
*call, struct pt_regs *reg
 {
struct syscall_tp_t {
struct trace_entry ent;
-   unsigned long syscall_nr;
+   int syscall_nr;
unsigned long ret;
} __aligned(8) param;
 
diff --git a/tools/testing/selftests/bpf/progs/profiler.inc.h 
b/tools/testing/selftests/bpf/progs/profiler.inc.h
index f799d87e87002..897061930cb76 100644
--- a/tools/testing/selftests/bpf/progs/profiler.inc.h
+++ b/tools/testing/selftests/bpf/progs/profiler.inc.h
@@ -609,7 +609,7 @@ ssize_t BPF_KPROBE(kprobe__proc_sys_write,
 }
 
 SEC("tracepoint/syscalls/sys_enter_kill")
-int tracepoint__syscalls__sys_enter_kill(struct trace_event_raw_sys_enter* ctx)
+int tracepoint__syscalls__sys_enter_kill(struct syscall_trace_enter* ctx)
 {
struct bpf_func_stats_ctx stats_ctx;
 
diff --git a/tools/testing/selftests/bpf/progs/test_vmlinux.c 
b/tools/testing/selftests/bpf/progs/test_vmlinux.c
index 4b8e37f7fd06c..78b23934d9f8f 100644
--- a/tools/testing/selftests/bpf/progs/test_vmlinux.c
+++ b/tools/testing/selftests/bpf/progs/test_vmlinux.c
@@ -16,12 +16,12 @@ bool kprobe_called = false;
 bool fentry_called = false;
 
 SEC("tp/syscalls/sys_enter_nanosleep")
-int handle__tp(struct trace_event_raw_sys_enter *args)
+int handle__tp(struct syscall_trace_enter *args)
 {
struct __kernel_timespec *ts;
long tv_nsec;
 
-   if (args->id != __NR_nanosleep)
+   if (args->nr != __NR_nanosleep)
return 0;
 
ts = (void *)args->args[0];
-- 
2.41.0




[RFC PATCH bpf-next] bpf: change syscall_nr type to int in struct syscall_tp_t

2023-10-12 Thread Artem Savkov
linux-rt-devel tree contains a patch (b1773eac3f29c ("sched: Add support
for lazy preemption")) that adds an extra member to struct trace_entry.
This causes the offset of args field in struct trace_event_raw_sys_enter
be different from the one in struct syscall_trace_enter:

struct trace_event_raw_sys_enter {
struct trace_entry ent;  /* 012 */

/* XXX last struct has 3 bytes of padding */
/* XXX 4 bytes hole, try to pack */

long int   id;   /*16 8 */
long unsigned int  args[6];  /*2448 */
/* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
char   __data[]; /*72 0 */

/* size: 72, cachelines: 2, members: 4 */
/* sum members: 68, holes: 1, sum holes: 4 */
/* paddings: 1, sum paddings: 3 */
/* last cacheline: 8 bytes */
};

struct syscall_trace_enter {
struct trace_entry ent;  /* 012 */

/* XXX last struct has 3 bytes of padding */

intnr;   /*12 4 */
long unsigned int  args[];   /*16 0 */

/* size: 16, cachelines: 1, members: 3 */
/* paddings: 1, sum paddings: 3 */
/* last cacheline: 16 bytes */
};

This, in turn, causes perf_event_set_bpf_prog() fail while running bpf
test_profiler testcase because max_ctx_offset is calculated based on the
former struct, while off on the latter:

  10488 if (is_tracepoint || is_syscall_tp) {
  10489 int off = trace_event_get_offsets(event->tp_event);
  10490
  10491 if (prog->aux->max_ctx_offset > off)
  10492 return -EACCES;
  10493 }

What bpf program is actually getting is a pointer to struct
syscall_tp_t, defined in kernel/trace/trace_syscalls.c. This patch fixes
the problem by aligning struct syscall_tp_t with with struct
syscall_trace_(enter|exit) and changing the tests to use these structs
to dereference context.

Signed-off-by: Artem Savkov 
---
 kernel/trace/trace_syscalls.c| 4 ++--
 tools/testing/selftests/bpf/progs/profiler.inc.h | 2 +-
 tools/testing/selftests/bpf/progs/test_vmlinux.c | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index de753403cdafb..9c581d6da843a 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -556,7 +556,7 @@ static int perf_call_bpf_enter(struct trace_event_call 
*call, struct pt_regs *re
 {
struct syscall_tp_t {
struct trace_entry ent;
-   unsigned long syscall_nr;
+   int syscall_nr;
unsigned long args[SYSCALL_DEFINE_MAXARGS];
} __aligned(8) param;
int i;
@@ -661,7 +661,7 @@ static int perf_call_bpf_exit(struct trace_event_call 
*call, struct pt_regs *reg
 {
struct syscall_tp_t {
struct trace_entry ent;
-   unsigned long syscall_nr;
+   int syscall_nr;
unsigned long ret;
} __aligned(8) param;
 
diff --git a/tools/testing/selftests/bpf/progs/profiler.inc.h 
b/tools/testing/selftests/bpf/progs/profiler.inc.h
index f799d87e87002..897061930cb76 100644
--- a/tools/testing/selftests/bpf/progs/profiler.inc.h
+++ b/tools/testing/selftests/bpf/progs/profiler.inc.h
@@ -609,7 +609,7 @@ ssize_t BPF_KPROBE(kprobe__proc_sys_write,
 }
 
 SEC("tracepoint/syscalls/sys_enter_kill")
-int tracepoint__syscalls__sys_enter_kill(struct trace_event_raw_sys_enter* ctx)
+int tracepoint__syscalls__sys_enter_kill(struct syscall_trace_enter* ctx)
 {
struct bpf_func_stats_ctx stats_ctx;
 
diff --git a/tools/testing/selftests/bpf/progs/test_vmlinux.c 
b/tools/testing/selftests/bpf/progs/test_vmlinux.c
index 4b8e37f7fd06c..78b23934d9f8f 100644
--- a/tools/testing/selftests/bpf/progs/test_vmlinux.c
+++ b/tools/testing/selftests/bpf/progs/test_vmlinux.c
@@ -16,12 +16,12 @@ bool kprobe_called = false;
 bool fentry_called = false;
 
 SEC("tp/syscalls/sys_enter_nanosleep")
-int handle__tp(struct trace_event_raw_sys_enter *args)
+int handle__tp(struct syscall_trace_enter *args)
 {
struct __kernel_timespec *ts;
long tv_nsec;
 
-   if (args->id != __NR_nanosleep)
+   if (args->nr != __NR_nanosleep)
return 0;
 
ts = (void *)args->args[0];
-- 
2.41.0




Re: [RFC PATCH] tracing: change syscall number type in struct syscall_trace_*

2023-10-05 Thread Artem Savkov
On Wed, Oct 04, 2023 at 02:55:47PM +0200, Artem Savkov wrote:
> On Tue, Oct 03, 2023 at 09:38:44PM -0400, Steven Rostedt wrote:
> > On Mon,  2 Oct 2023 15:52:42 +0200
> > Artem Savkov  wrote:
> > 
> > > linux-rt-devel tree contains a patch that adds an extra member to struct
> > > trace_entry. This causes the offset of args field in struct
> > > trace_event_raw_sys_enter be different from the one in struct
> > > syscall_trace_enter:
> > 
> > This patch looks like it's fixing the symptom and not the issue. No code
> > should rely on the two event structures to be related. That's an unwanted
> > coupling, that will likely cause issues down the road (like the RT patch
> > you mentioned).
> 
> I agree, but I didn't see a better solution and that was my way of
> starting conversation, thus the RFC.
> 
> > > 
> > > struct trace_event_raw_sys_enter {
> > > struct trace_entry ent;  /* 012 */
> > > 
> > > /* XXX last struct has 3 bytes of padding */
> > > /* XXX 4 bytes hole, try to pack */
> > > 
> > > long int   id;   /*16 8 */
> > > long unsigned int  args[6];  /*2448 */
> > > /* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
> > > char   __data[]; /*72 0 */
> > > 
> > > /* size: 72, cachelines: 2, members: 4 */
> > > /* sum members: 68, holes: 1, sum holes: 4 */
> > > /* paddings: 1, sum paddings: 3 */
> > > /* last cacheline: 8 bytes */
> > > };
> > > 
> > > struct syscall_trace_enter {
> > > struct trace_entry ent;  /* 012 */
> > > 
> > > /* XXX last struct has 3 bytes of padding */
> > > 
> > > intnr;   /*12 4 */
> > > long unsigned int  args[];   /*16 0 */
> > > 
> > > /* size: 16, cachelines: 1, members: 3 */
> > > /* paddings: 1, sum paddings: 3 */
> > > /* last cacheline: 16 bytes */
> > > };
> > > 
> > > This, in turn, causes perf_event_set_bpf_prog() fail while running bpf
> > > test_profiler testcase because max_ctx_offset is calculated based on the
> > > former struct, while off on the latter:
> > 
> > The above appears to be pointing to the real bug. The "is calculated based
> > on the former struct while off on the latter" Why are the two being used
> > together? They are supposed to be *unrelated*!
> > 
> > 
> > > 
> > >   10488 if (is_tracepoint || is_syscall_tp) {
> > >   10489 int off = 
> > > trace_event_get_offsets(event->tp_event);
> > 
> > So basically this is clumping together the raw_syscalls with the syscalls
> > events as if they are the same. But the are not. They are created
> > differently. It's basically like using one structure to get the offsets of
> > another structure. That would be a bug anyplace else in the kernel. Sounds
> > like it's a bug here too.
> > 
> > I think the issue is with this code, not the tracing code.
> > 
> > We could expose the struct syscall_trace_enter and syscall_trace_exit if
> > the offsets to those are needed.
> 
> I don't think we need syscall_trace_* offsets, looks like
> trace_event_get_offsets() should return offset trace_event_raw_sys_enter
> instead. I am still trying to figure out how all of this works together.
> Maybe Alexei or Andrii have more context here.

Turns out it is even more confusing. The tests dereference the context as
struct trace_event_raw_sys_enter so bpf verifier sets max_ctx_offset
based on that, then perf_event_set_bpf_prog() checks this offset against
the one in struct syscall_trace_enter, but what bpf prog really gets is
a pointer to struct syscall_tp_t from kernel/trace/trace_syscalls.c.

I don't know the history behind these decisions, but should the tests
dereference context as struct syscall_trace_enter instead and struct
syscall_tp_t be changed to have syscall_nr as int?

-- 
 Artem




Re: [RFC PATCH] tracing: change syscall number type in struct syscall_trace_*

2023-10-04 Thread Artem Savkov
On Tue, Oct 03, 2023 at 09:38:44PM -0400, Steven Rostedt wrote:
> On Mon,  2 Oct 2023 15:52:42 +0200
> Artem Savkov  wrote:
> 
> > linux-rt-devel tree contains a patch that adds an extra member to struct
> > trace_entry. This causes the offset of args field in struct
> > trace_event_raw_sys_enter be different from the one in struct
> > syscall_trace_enter:
> 
> This patch looks like it's fixing the symptom and not the issue. No code
> should rely on the two event structures to be related. That's an unwanted
> coupling, that will likely cause issues down the road (like the RT patch
> you mentioned).

I agree, but I didn't see a better solution and that was my way of
starting conversation, thus the RFC.

> > 
> > struct trace_event_raw_sys_enter {
> > struct trace_entry ent;  /* 012 */
> > 
> > /* XXX last struct has 3 bytes of padding */
> > /* XXX 4 bytes hole, try to pack */
> > 
> > long int   id;   /*16 8 */
> > long unsigned int  args[6];  /*2448 */
> > /* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
> > char   __data[]; /*72 0 */
> > 
> > /* size: 72, cachelines: 2, members: 4 */
> > /* sum members: 68, holes: 1, sum holes: 4 */
> > /* paddings: 1, sum paddings: 3 */
> > /* last cacheline: 8 bytes */
> > };
> > 
> > struct syscall_trace_enter {
> > struct trace_entry ent;  /* 012 */
> > 
> > /* XXX last struct has 3 bytes of padding */
> > 
> > intnr;   /*12 4 */
> > long unsigned int  args[];   /*16 0 */
> > 
> > /* size: 16, cachelines: 1, members: 3 */
> > /* paddings: 1, sum paddings: 3 */
> > /* last cacheline: 16 bytes */
> > };
> > 
> > This, in turn, causes perf_event_set_bpf_prog() fail while running bpf
> > test_profiler testcase because max_ctx_offset is calculated based on the
> > former struct, while off on the latter:
> 
> The above appears to be pointing to the real bug. The "is calculated based
> on the former struct while off on the latter" Why are the two being used
> together? They are supposed to be *unrelated*!
> 
> 
> > 
> >   10488 if (is_tracepoint || is_syscall_tp) {
> >   10489 int off = trace_event_get_offsets(event->tp_event);
> 
> So basically this is clumping together the raw_syscalls with the syscalls
> events as if they are the same. But the are not. They are created
> differently. It's basically like using one structure to get the offsets of
> another structure. That would be a bug anyplace else in the kernel. Sounds
> like it's a bug here too.
> 
> I think the issue is with this code, not the tracing code.
> 
> We could expose the struct syscall_trace_enter and syscall_trace_exit if
> the offsets to those are needed.

I don't think we need syscall_trace_* offsets, looks like
trace_event_get_offsets() should return offset trace_event_raw_sys_enter
instead. I am still trying to figure out how all of this works together.
Maybe Alexei or Andrii have more context here.

-- 
 Artem




Re: [RFC PATCH] tracing: change syscall number type in struct syscall_trace_*

2023-10-04 Thread Artem Savkov
On Tue, Oct 03, 2023 at 03:11:15PM -0700, Andrii Nakryiko wrote:
> On Mon, Oct 2, 2023 at 6:53 AM Artem Savkov  wrote:
> >
> > linux-rt-devel tree contains a patch that adds an extra member to struct
> 
> can you please point to the patch itself that makes that change?

Of course, some context would be useful. The patch in question is b1773eac3f29c
("sched: Add support for lazy preemption") from rt-devel tree [0]. It came up
a couple of times before: [1] [2] [3] [4].

[0] 
https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/commit/?id=b1773eac3f29cbdcdfd16e0339f1a164066e9f71
[1] 
https://lore.kernel.org/linux-rt-users/20200221153541.681468-1-jo...@kernel.org/t/#u
[2] 
https://github.com/iovisor/bpftrace/commit/a2e3d5dbc03ceb49b776cf5602d31896158844a7
[3] https://lore.kernel.org/bpf/xunyjzy64q9b@redhat.com/t/#u
[4] https://lore.kernel.org/bpf/20230727150647.397626-1-ykali...@redhat.com/t/#u

> > trace_entry. This causes the offset of args field in struct
> > trace_event_raw_sys_enter be different from the one in struct
> > syscall_trace_enter:
> >
> > struct trace_event_raw_sys_enter {
> > struct trace_entry ent;  /* 012 */
> >
> > /* XXX last struct has 3 bytes of padding */
> > /* XXX 4 bytes hole, try to pack */
> >
> > long int   id;   /*16 8 */
> > long unsigned int  args[6];  /*2448 */
> > /* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
> > char   __data[]; /*72 0 */
> >
> > /* size: 72, cachelines: 2, members: 4 */
> > /* sum members: 68, holes: 1, sum holes: 4 */
> > /* paddings: 1, sum paddings: 3 */
> > /* last cacheline: 8 bytes */
> > };
> >
> > struct syscall_trace_enter {
> > struct trace_entry ent;  /* 012 */
> >
> > /* XXX last struct has 3 bytes of padding */
> >
> > intnr;   /*12 4 */
> > long unsigned int  args[];   /*16 0 */
> >
> > /* size: 16, cachelines: 1, members: 3 */
> > /* paddings: 1, sum paddings: 3 */
> > /* last cacheline: 16 bytes */
> > };
> >
> > This, in turn, causes perf_event_set_bpf_prog() fail while running bpf
> > test_profiler testcase because max_ctx_offset is calculated based on the
> > former struct, while off on the latter:
> >
> >   10488 if (is_tracepoint || is_syscall_tp) {
> >   10489 int off = trace_event_get_offsets(event->tp_event);
> >   10490
> >   10491 if (prog->aux->max_ctx_offset > off)
> >   10492 return -EACCES;
> >   10493 }
> >
> > This patch changes the type of nr member in syscall_trace_* structs to
> > be long so that "args" offset is equal to that in struct
> > trace_event_raw_sys_enter.
> >
> > Signed-off-by: Artem Savkov 
> > ---
> >  kernel/trace/trace.h  | 4 ++--
> >  kernel/trace/trace_syscalls.c | 7 ---
> >  2 files changed, 6 insertions(+), 5 deletions(-)
> >
> > diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> > index 77debe53f07cf..cd1d24df85364 100644
> > --- a/kernel/trace/trace.h
> > +++ b/kernel/trace/trace.h
> > @@ -135,13 +135,13 @@ enum trace_type {
> >   */
> >  struct syscall_trace_enter {
> > struct trace_entry  ent;
> > -   int nr;
> > +   longnr;
> > unsigned long   args[];
> >  };
> >
> >  struct syscall_trace_exit {
> > struct trace_entry  ent;
> > -   int nr;
> > +   longnr;
> > longret;
> >  };
> >
> > diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
> > index de753403cdafb..c26939119f2e4 100644
> > --- a/kernel/trace/trace_syscalls.c
> > +++ b/kernel/trace/trace_syscalls.c
> > @@ -101,7 +101,7 @@ find_syscall_meta(unsigned long syscall)
> > return NULL;
> >  }
> >
> > -static struct syscall_metadata *syscall_nr_to_meta(int nr)
> > +static struct syscall_metadata *syscall_nr_to_meta(long nr)
> >  {
> > if (IS_ENABLED(CONFIG_HAVE_SPARSE_SYSCALL_NR))
> > return xa_load(_metadata_sparse, (unsigned 
> &g

[RFC PATCH] tracing: change syscall number type in struct syscall_trace_*

2023-10-02 Thread Artem Savkov
linux-rt-devel tree contains a patch that adds an extra member to struct
trace_entry. This causes the offset of args field in struct
trace_event_raw_sys_enter be different from the one in struct
syscall_trace_enter:

struct trace_event_raw_sys_enter {
struct trace_entry ent;  /* 012 */

/* XXX last struct has 3 bytes of padding */
/* XXX 4 bytes hole, try to pack */

long int   id;   /*16 8 */
long unsigned int  args[6];  /*2448 */
/* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
char   __data[]; /*72 0 */

/* size: 72, cachelines: 2, members: 4 */
/* sum members: 68, holes: 1, sum holes: 4 */
/* paddings: 1, sum paddings: 3 */
/* last cacheline: 8 bytes */
};

struct syscall_trace_enter {
struct trace_entry ent;  /* 012 */

/* XXX last struct has 3 bytes of padding */

intnr;   /*12 4 */
long unsigned int  args[];   /*16 0 */

/* size: 16, cachelines: 1, members: 3 */
/* paddings: 1, sum paddings: 3 */
/* last cacheline: 16 bytes */
};

This, in turn, causes perf_event_set_bpf_prog() fail while running bpf
test_profiler testcase because max_ctx_offset is calculated based on the
former struct, while off on the latter:

  10488 if (is_tracepoint || is_syscall_tp) {
  10489 int off = trace_event_get_offsets(event->tp_event);
  10490
  10491 if (prog->aux->max_ctx_offset > off)
  10492 return -EACCES;
  10493 }

This patch changes the type of nr member in syscall_trace_* structs to
be long so that "args" offset is equal to that in struct
trace_event_raw_sys_enter.

Signed-off-by: Artem Savkov 
---
 kernel/trace/trace.h  | 4 ++--
 kernel/trace/trace_syscalls.c | 7 ---
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 77debe53f07cf..cd1d24df85364 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -135,13 +135,13 @@ enum trace_type {
  */
 struct syscall_trace_enter {
struct trace_entry  ent;
-   int nr;
+   longnr;
unsigned long   args[];
 };
 
 struct syscall_trace_exit {
struct trace_entry  ent;
-   int nr;
+   longnr;
longret;
 };
 
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index de753403cdafb..c26939119f2e4 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -101,7 +101,7 @@ find_syscall_meta(unsigned long syscall)
return NULL;
 }
 
-static struct syscall_metadata *syscall_nr_to_meta(int nr)
+static struct syscall_metadata *syscall_nr_to_meta(long nr)
 {
if (IS_ENABLED(CONFIG_HAVE_SPARSE_SYSCALL_NR))
return xa_load(_metadata_sparse, (unsigned long)nr);
@@ -132,7 +132,8 @@ print_syscall_enter(struct trace_iterator *iter, int flags,
struct trace_entry *ent = iter->ent;
struct syscall_trace_enter *trace;
struct syscall_metadata *entry;
-   int i, syscall;
+   int i;
+   long syscall;
 
trace = (typeof(trace))ent;
syscall = trace->nr;
@@ -177,7 +178,7 @@ print_syscall_exit(struct trace_iterator *iter, int flags,
struct trace_seq *s = >seq;
struct trace_entry *ent = iter->ent;
struct syscall_trace_exit *trace;
-   int syscall;
+   long syscall;
struct syscall_metadata *entry;
 
trace = (typeof(trace))ent;
-- 
2.41.0




Re: [PATCH] kbuild: add extra-y to targets-for-modules

2020-12-17 Thread Artem Savkov
On Thu, Dec 17, 2020 at 05:26:07PM +0900, Masahiro Yamada wrote:
> On Thu, Dec 17, 2020 at 8:04 AM Joe Lawrence  wrote:
> >
> > On 12/16/20 1:14 AM, Masahiro Yamada wrote:
> > > On Tue, Dec 8, 2020 at 11:31 PM Artem Savkov  
> > > wrote:
> > >>
> > >> On Tue, Dec 08, 2020 at 05:20:35PM +0800, WANG Chao wrote:
> > >>> Sorry for the late reply.
> > >>>
> > >>> On 11/25/20 at 10:42P, Masahiro Yamada wrote:
> > >>>> On Tue, Nov 24, 2020 at 12:05 AM WANG Chao  wrote:
> > >>>>>
> > >>>>> On 11/23/20 at 02:23P, Masahiro Yamada wrote:
> > >>>>>> On Tue, Nov 3, 2020 at 3:23 PM WANG Chao  wrote:
> > >>>>>>>
> > >>>>>>> extra-y target doesn't build for 'make M=...' since commit 
> > >>>>>>> 6212804f2d78
> > >>>>>>> ("kbuild: do not create built-in objects for external module 
> > >>>>>>> builds").
> > >>>>>>>
> > >>>>>>> This especially breaks kpatch, which is using 'extra-y := 
> > >>>>>>> kpatch.lds'
> > >>>>>>> and 'make M=...' to build livepatch patch module.
> > >>>>>>>
> > >>>>>>> Add extra-y to targets-for-modules so that such kind of build works
> > >>>>>>> properly.
> > >>>>>>>
> > >>>>>>> Signed-off-by: WANG Chao 
> > >>>>>>> ---
> > >>>>>>>   scripts/Makefile.build | 2 +-
> > >>>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
> > >>>>>>>
> > >>>>>>> diff --git a/scripts/Makefile.build b/scripts/Makefile.build
> > >>>>>>> index ae647379b579..0113a042d643 100644
> > >>>>>>> --- a/scripts/Makefile.build
> > >>>>>>> +++ b/scripts/Makefile.build
> > >>>>>>> @@ -86,7 +86,7 @@ ifdef need-builtin
> > >>>>>>>   targets-for-builtin += $(obj)/built-in.a
> > >>>>>>>   endif
> > >>>>>>>
> > >>>>>>> -targets-for-modules := $(patsubst %.o, %.mod, $(filter %.o, 
> > >>>>>>> $(obj-m)))
> > >>>>>>> +targets-for-modules := $(extra-y) $(patsubst %.o, %.mod, $(filter 
> > >>>>>>> %.o, $(obj-m)))
> > >>>>>>>
> > >>>>>>>   ifdef need-modorder
> > >>>>>>>   targets-for-modules += $(obj)/modules.order
> > >>>>>>> --
> > >>>>>>> 2.29.1
> > >>>>>>>
> > >>>>>>
> > >>>>>> NACK.
> > >>>>>>
> > >>>>>> Please fix your Makefile.
> > >>>>>>
> > >>>>>> Hint:
> > >>>>>> https://patchwork.kernel.org/project/linux-kbuild/patch/20201123045403.63402-6-masahi...@kernel.org/
> > >>>>>>
> > >>>>>>
> > >>>>>> Probably what you should use is 'targets'.
> > >>>>>
> > >>>>> I tried with 'targets' and 'always-y'. Both doesn't work for me.
> > >>>>>
> > >>>>> I narraw it down to the following example:
> > >>>>>
> > >>>>> cat > Makefile << _EOF_
> > >>>>> obj-m += foo.o
> > >>>>>
> > >>>>> ldflags-y += -T $(src)/kpatch.lds
> > >>>>> always-y += kpatch.lds
> > >>>>>
> > >>>>> foo-objs += bar.o
> > >>>>>
> > >>>>> all:
> > >>>>>  make -C /lib/modules/$(shell uname -r)/build M=$(PWD)
> > >>>>> clean:
> > >>>>>  make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean
> > >>>>> _EOF_
> > >>>>>
> > >>>>> Take a look into scripts/Makefile.build:488:
> > >>>>>
> > >>>>> __build: $(if $(KBUILD_BUILTIN), $(targets-for-builtin)) \
> > >>>>>   $(if $(KBUILD_MODULES), $(targets-for-modules)) \
> > >>>>>   $(subdir-ym) $(always-y)
> > >

Re: [PATCH] kbuild: add extra-y to targets-for-modules

2020-12-08 Thread Artem Savkov
On Tue, Dec 08, 2020 at 05:20:35PM +0800, WANG Chao wrote:
> Sorry for the late reply.
> 
> On 11/25/20 at 10:42P, Masahiro Yamada wrote:
> > On Tue, Nov 24, 2020 at 12:05 AM WANG Chao  wrote:
> > >
> > > On 11/23/20 at 02:23P, Masahiro Yamada wrote:
> > > > On Tue, Nov 3, 2020 at 3:23 PM WANG Chao  wrote:
> > > > >
> > > > > extra-y target doesn't build for 'make M=...' since commit 
> > > > > 6212804f2d78
> > > > > ("kbuild: do not create built-in objects for external module builds").
> > > > >
> > > > > This especially breaks kpatch, which is using 'extra-y := kpatch.lds'
> > > > > and 'make M=...' to build livepatch patch module.
> > > > >
> > > > > Add extra-y to targets-for-modules so that such kind of build works
> > > > > properly.
> > > > >
> > > > > Signed-off-by: WANG Chao 
> > > > > ---
> > > > >  scripts/Makefile.build | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/scripts/Makefile.build b/scripts/Makefile.build
> > > > > index ae647379b579..0113a042d643 100644
> > > > > --- a/scripts/Makefile.build
> > > > > +++ b/scripts/Makefile.build
> > > > > @@ -86,7 +86,7 @@ ifdef need-builtin
> > > > >  targets-for-builtin += $(obj)/built-in.a
> > > > >  endif
> > > > >
> > > > > -targets-for-modules := $(patsubst %.o, %.mod, $(filter %.o, 
> > > > > $(obj-m)))
> > > > > +targets-for-modules := $(extra-y) $(patsubst %.o, %.mod, $(filter 
> > > > > %.o, $(obj-m)))
> > > > >
> > > > >  ifdef need-modorder
> > > > >  targets-for-modules += $(obj)/modules.order
> > > > > --
> > > > > 2.29.1
> > > > >
> > > >
> > > > NACK.
> > > >
> > > > Please fix your Makefile.
> > > >
> > > > Hint:
> > > > https://patchwork.kernel.org/project/linux-kbuild/patch/20201123045403.63402-6-masahi...@kernel.org/
> > > >
> > > >
> > > > Probably what you should use is 'targets'.
> > >
> > > I tried with 'targets' and 'always-y'. Both doesn't work for me.
> > >
> > > I narraw it down to the following example:
> > >
> > > cat > Makefile << _EOF_
> > > obj-m += foo.o
> > >
> > > ldflags-y += -T $(src)/kpatch.lds
> > > always-y += kpatch.lds
> > >
> > > foo-objs += bar.o
> > >
> > > all:
> > > make -C /lib/modules/$(shell uname -r)/build M=$(PWD)
> > > clean:
> > > make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean
> > > _EOF_
> > >
> > > Take a look into scripts/Makefile.build:488:
> > >
> > > __build: $(if $(KBUILD_BUILTIN), $(targets-for-builtin)) \
> > >  $(if $(KBUILD_MODULES), $(targets-for-modules)) \
> > >  $(subdir-ym) $(always-y)
> > > @:
> > >
> > > 'always-y' is built after 'targets-for-modules'. This makes
> > > 'targets-for-modules' fails because kpatch.lds isn't there.
> > 
> > 
> > Heh, you rely on the targets built from left to right,
> > and you have never thought Make supports the parallel option -j.
> 
> You're right. I missed that.
> 
> > 
> > 
> > You need to specify the dependency if you expect objects
> > are built in the particular order.
> > 
> > However, in this case, using ldflags-y looks wrong
> > in the first place.
> > 
> > The linker script is used when combining the object
> > as well as the final link of *.ko

We want linker script to be used on both those steps, otherwise modpost
fails.

It looks like the right thing to do here is leave ldflags-y in, get rid
of always-y/extra-y altogether and specify our linker script as a
dependency for the object.

> I don't have a clean fix to kpatch right now.
> 
> I'm looping kpatch forks in. They're also looking at this right now:
> 
> https://github.com/dynup/kpatch/pull/1149
> 
> Thanks
> WANG Chao
> 
> > 
> > 
> > > For 'targets', in case of OOT, does not seem to be useful.
> > >
> > > What change do you suggest to make to fix this kind of Makefile?
> > >
> > > Thanks,
> > > WANG Chao
> > 
> > 
> > 
> > -- 
> > Best Regards
> > Masahiro Yamada
> > 
> 

-- 
Regards,
  Artem Savkov


Re: [PATCH] pty: do tty_flip_buffer_push without port->lock in pty_write

2020-09-04 Thread Artem Savkov
Hello Sergey,

On Fri, Sep 04, 2020 at 04:43:33PM +0900, Sergey Senozhatsky wrote:
> On (20/09/01 14:01), Artem Savkov wrote:
> [..]
> > It looks like the commit was aimed to protect tty_insert_flip_string and
> > there is no need for tty_flip_buffer_push to be under this lock.
> >
> [..]
> > @@ -120,10 +120,10 @@ static int pty_write(struct tty_struct *tty, const 
> > unsigned char *buf, int c)
> > spin_lock_irqsave(>port->lock, flags);
> > /* Stuff the data into the input queue of the other end */
> > c = tty_insert_flip_string(to->port, buf, c);
> > +   spin_unlock_irqrestore(>port->lock, flags);
> > /* And shovel */
> > if (c)
> > tty_flip_buffer_push(to->port);
> > -   spin_unlock_irqrestore(>port->lock, flags);
> 
> Performing unprotected
> 
>   smp_store_release(>tail->commit, buf->tail->used);
> 
> does not look safe to me.
> 
> 
> This path can be called concurrently - "pty_write vs console's IRQ handler
> (TX/RX)", for instance.
> 
> Doing this
> 
>   queue_work(system_unbound_wq, >work);
> 
> outside of port->lock scope also sounds like possible concurrent data
> modification.
> 
> I'm not sure I see how this patch is safe.

Yes, indeed I see how this might be unsafe, but this argument doesn't
hold well with console drivers other than 8250 - most of them seem to
call tty_flip_buffer_push() outside of port->lock, many even unlock and
then relock right around this call to avoid similar possible deadlocks.
Even 8250 itself used to do this "recently". After all potentially
corrupted console is better than a deadlock.

I know this is no excuse to add unsafe code but unfortunately I don't
see a better solution at the moment, although admittedly I am not very
familiar with tty code.

-- 
 Artem



[PATCH v2] pty: do tty_flip_buffer_push without port->lock in pty_write

2020-09-02 Thread Artem Savkov
b6da31b2c07c "tty: Fix data race in tty_insert_flip_string_fixed_flag"
puts tty_flip_buffer_push under port->lock introducing the following
possible circular locking dependency:

[30129.876566] ==
[30129.876566] WARNING: possible circular locking dependency detected
[30129.876567] 5.9.0-rc2+ #3 Tainted: G S  W
[30129.876568] --
[30129.876568] sysrq.sh/1222 is trying to acquire lock:
[30129.876569] 92c39480 (console_owner){}-{0:0}, at: 
console_unlock+0x3fe/0xa90

[30129.876572] but task is already holding lock:
[30129.876572] 888107cb9018 (>lock/1){-.-.}-{2:2}, at: 
show_workqueue_state.cold.55+0x15b/0x6ca

[30129.876576] which lock already depends on the new lock.

[30129.876577] the existing dependency chain (in reverse order) is:

[30129.876578] -> #3 (>lock/1){-.-.}-{2:2}:
[30129.876581]_raw_spin_lock+0x30/0x70
[30129.876581]__queue_work+0x1a3/0x10f0
[30129.876582]queue_work_on+0x78/0x80
[30129.876582]pty_write+0x165/0x1e0
[30129.876583]n_tty_write+0x47f/0xf00
[30129.876583]tty_write+0x3d6/0x8d0
[30129.876584]vfs_write+0x1a8/0x650

[30129.876588] -> #2 (>lock#2){-.-.}-{2:2}:
[30129.876590]_raw_spin_lock_irqsave+0x3b/0x80
[30129.876591]tty_port_tty_get+0x1d/0xb0
[30129.876592]tty_port_default_wakeup+0xb/0x30
[30129.876592]serial8250_tx_chars+0x3d6/0x970
[30129.876593]serial8250_handle_irq.part.12+0x216/0x380
[30129.876593]serial8250_default_handle_irq+0x82/0xe0
[30129.876594]serial8250_interrupt+0xdd/0x1b0
[30129.876595]__handle_irq_event_percpu+0xfc/0x850

[30129.876602] -> #1 (>lock){-.-.}-{2:2}:
[30129.876605]_raw_spin_lock_irqsave+0x3b/0x80
[30129.876605]serial8250_console_write+0x12d/0x900
[30129.876606]console_unlock+0x679/0xa90
[30129.876606]register_console+0x371/0x6e0
[30129.876607]univ8250_console_init+0x24/0x27
[30129.876607]console_init+0x2f9/0x45e

[30129.876609] -> #0 (console_owner){}-{0:0}:
[30129.876611]__lock_acquire+0x2f70/0x4e90
[30129.876612]lock_acquire+0x1ac/0xad0
[30129.876612]console_unlock+0x460/0xa90
[30129.876613]vprintk_emit+0x130/0x420
[30129.876613]printk+0x9f/0xc5
[30129.876614]show_pwq+0x154/0x618
[30129.876615]show_workqueue_state.cold.55+0x193/0x6ca
[30129.876615]__handle_sysrq+0x244/0x460
[30129.876616]write_sysrq_trigger+0x48/0x4a
[30129.876616]proc_reg_write+0x1a6/0x240
[30129.876617]vfs_write+0x1a8/0x650

[30129.876619] other info that might help us debug this:

[30129.876620] Chain exists of:
[30129.876621]   console_owner --> >lock#2 --> >lock/1

[30129.876625]  Possible unsafe locking scenario:

[30129.876626]CPU0CPU1
[30129.876626]
[30129.876627]   lock(>lock/1);
[30129.876628]lock(>lock#2);
[30129.876630]lock(>lock/1);
[30129.876631]   lock(console_owner);

[30129.876633]  *** DEADLOCK ***

[30129.876634] 5 locks held by sysrq.sh/1222:
[30129.876634]  #0: 8881d3ce0470 (sb_writers#3){.+.+}-{0:0}, at: 
vfs_write+0x359/0x650
[30129.876637]  #1: 92c612c0 (rcu_read_lock){}-{1:2}, at: 
__handle_sysrq+0x4d/0x460
[30129.876640]  #2: 92c612c0 (rcu_read_lock){}-{1:2}, at: 
show_workqueue_state+0x5/0xf0
[30129.876642]  #3: 888107cb9018 (>lock/1){-.-.}-{2:2}, at: 
show_workqueue_state.cold.55+0x15b/0x6ca
[30129.876645]  #4: 92c39980 (console_lock){+.+.}-{0:0}, at: 
vprintk_emit+0x123/0x420

[30129.876648] stack backtrace:
[30129.876649] CPU: 3 PID: 1222 Comm: sysrq.sh Tainted: G S  W 
5.9.0-rc2+ #3
[30129.876649] Hardware name: Intel Corporation 2012 Client Platform/Emerald 
Lake 2, BIOS ACRVMBY1.86C.0078.P00.1201161002 01/16/2012
[30129.876650] Call Trace:
[30129.876650]  dump_stack+0x9d/0xe0
[30129.876651]  check_noncircular+0x34f/0x410
[30129.876653]  __lock_acquire+0x2f70/0x4e90
[30129.876656]  lock_acquire+0x1ac/0xad0
[30129.876658]  console_unlock+0x460/0xa90
[30129.876660]  vprintk_emit+0x130/0x420
[30129.876660]  printk+0x9f/0xc5
[30129.876661]  show_pwq+0x154/0x618
[30129.876662]  show_workqueue_state.cold.55+0x193/0x6ca
[30129.876664]  __handle_sysrq+0x244/0x460
[30129.876665]  write_sysrq_trigger+0x48/0x4a
[30129.876665]  proc_reg_write+0x1a6/0x240
[30129.87]  vfs_write+0x1a8/0x650

It looks like the commit was aimed to protect tty_insert_flip_string and
there is no need for tty_flip_buffer_push to be under this lock.

Fixes: b6da31b2c07c ("tty: Fix data race in tty_insert_flip_string_fixed_flag")
Signed-off-by: Artem Savkov 
Acked-by: Jiri Slaby 
---

v2: trimmed stack traces in commit message.

 drivers/tty/pty.c | 2 +-
 1 file c

[PATCH] pty: do tty_flip_buffer_push without port->lock in pty_write

2020-09-01 Thread Artem Savkov
9.876649] CPU: 3 PID: 1222 Comm: sysrq.sh Tainted: G S  W 
5.9.0-rc2+ #3
[30129.876649] Hardware name: Intel Corporation 2012 Client Platform/Emerald 
Lake 2, BIOS ACRVMBY1.86C.0078.P00.1201161002 01/16/2012
[30129.876650] Call Trace:
[30129.876650]  dump_stack+0x9d/0xe0
[30129.876651]  check_noncircular+0x34f/0x410
[30129.876652]  ? print_circular_bug+0x360/0x360
[30129.876652]  ? mark_lock+0x144/0x19e0
[30129.876653]  ? sched_clock+0x5/0x10
[30129.876653]  __lock_acquire+0x2f70/0x4e90
[30129.876654]  ? lockdep_hardirqs_on_prepare+0x4e0/0x4e0
[30129.876654]  ? sched_clock+0x5/0x10
[30129.876655]  ? sched_clock_cpu+0x18/0x1d0
[30129.876655]  ? find_held_lock+0x3a/0x1c0
[30129.876656]  lock_acquire+0x1ac/0xad0
[30129.876656]  ? console_unlock+0x3fe/0xa90
[30129.876657]  ? lock_downgrade+0x730/0x730
[30129.876657]  ? rcu_read_unlock+0x50/0x50
[30129.876658]  console_unlock+0x460/0xa90
[30129.876658]  ? console_unlock+0x3fe/0xa90
[30129.876659]  ? __down_trylock_console_sem+0x76/0x80
[30129.876660]  vprintk_emit+0x130/0x420
[30129.876660]  printk+0x9f/0xc5
[30129.876661]  ? kmsg_dump_rewind_nolock+0xd9/0xd9
[30129.876661]  show_pwq+0x154/0x618
[30129.876662]  show_workqueue_state.cold.55+0x193/0x6ca
[30129.876662]  ? printk+0x9f/0xc5
[30129.876663]  ? print_worker_info+0x260/0x260
[30129.876663]  ? debug_show_all_locks+0x1f2/0x209
[30129.876664]  __handle_sysrq+0x244/0x460
[30129.876665]  write_sysrq_trigger+0x48/0x4a
[30129.876665]  proc_reg_write+0x1a6/0x240
[30129.87]  vfs_write+0x1a8/0x650
[30129.87]  ksys_write+0xf1/0x1c0
[30129.876667]  ? __ia32_sys_read+0xb0/0xb0
[30129.876667]  ? syscall_enter_from_user_mode+0x2a/0x2b0
[30129.876668]  do_syscall_64+0x33/0x40
[30129.876669]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[30129.876669] RIP: 0033:0x7f0446ab28a8
[30129.876671] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 
0f 1e fa 48 8d 05 b5 4c 2d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 
f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
[30129.876671] RSP: 002b:7fff991890c8 EFLAGS: 0246 ORIG_RAX: 
0001
[30129.876672] RAX: ffda RBX: 0002 RCX: 7f0446ab28a8
[30129.876673] RDX: 0002 RSI: 55887dae8e00 RDI: 0001
[30129.876674] RBP: 55887dae8e00 R08: 000a R09: 7f0446b42d40
[30129.876674] R10: 000a R11: 0246 R12: 7f0446d836c0
[30129.876675] R13: 0002 R14: 7f0446d7e880 R15: 0002

It looks like the commit was aimed to protect tty_insert_flip_string and
there is no need for tty_flip_buffer_push to be under this lock.

Fixes: b6da31b2c07c ("tty: Fix data race in tty_insert_flip_string_fixed_flag")
Signed-off-by: Artem Savkov 
---
 drivers/tty/pty.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tty/pty.c b/drivers/tty/pty.c
index 00099a8439d2..c6a1d8c4e689 100644
--- a/drivers/tty/pty.c
+++ b/drivers/tty/pty.c
@@ -120,10 +120,10 @@ static int pty_write(struct tty_struct *tty, const 
unsigned char *buf, int c)
spin_lock_irqsave(>port->lock, flags);
/* Stuff the data into the input queue of the other end */
c = tty_insert_flip_string(to->port, buf, c);
+   spin_unlock_irqrestore(>port->lock, flags);
/* And shovel */
if (c)
tty_flip_buffer_push(to->port);
-   spin_unlock_irqrestore(>port->lock, flags);
}
return c;
 }
-- 
2.26.2



[tip:x86/build] x86/tools/relocs: Fix big section header tables

2019-04-19 Thread tip-bot for Artem Savkov
Commit-ID:  f36e7495dd3990d6848e6d6703c78f1f17a97538
Gitweb: https://git.kernel.org/tip/f36e7495dd3990d6848e6d6703c78f1f17a97538
Author: Artem Savkov 
AuthorDate: Thu, 29 Nov 2018 16:56:15 +0100
Committer:  Ingo Molnar 
CommitDate: Fri, 19 Apr 2019 20:54:07 +0200

x86/tools/relocs: Fix big section header tables

In case when the number of entries in the section header table is larger
then or equal to SHN_LORESERVE the size of the table is held in the sh_size
member of the initial entry in section header table instead of e_shnum.
Same with the string table index which is located in sh_link instead of
e_shstrndx.

This case is easily reproducible with KCFLAGS="-ffunction-sections",
bzImage build fails with "String table index out of bounds" error.

Signed-off-by: Artem Savkov 
Reviewed-by: Josh Poimboeuf 
Acked-by: Joe Lawrence 
Cc: Eric W . Biederman 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: http://lkml.kernel.org/r/20181129155615.2594-1-asav...@redhat.com
[ Simplify the die() lines. ]
Signed-off-by: Ingo Molnar 
---
 arch/x86/tools/relocs.c | 74 ++---
 1 file changed, 45 insertions(+), 29 deletions(-)

diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index b629f6992d9f..f345586f5e50 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -11,7 +11,9 @@
 #define Elf_Shdr   ElfW(Shdr)
 #define Elf_SymElfW(Sym)
 
-static Elf_Ehdr ehdr;
+static Elf_Ehdrehdr;
+static unsigned long   shnum;
+static unsigned intshstrndx;
 
 struct relocs {
uint32_t*offset;
@@ -241,9 +243,9 @@ static const char *sec_name(unsigned shndx)
 {
const char *sec_strtab;
const char *name;
-   sec_strtab = secs[ehdr.e_shstrndx].strtab;
+   sec_strtab = secs[shstrndx].strtab;
name = "";
-   if (shndx < ehdr.e_shnum) {
+   if (shndx < shnum) {
name = sec_strtab + secs[shndx].shdr.sh_name;
}
else if (shndx == SHN_ABS) {
@@ -271,7 +273,7 @@ static const char *sym_name(const char *sym_strtab, Elf_Sym 
*sym)
 static Elf_Sym *sym_lookup(const char *symname)
 {
int i;
-   for (i = 0; i < ehdr.e_shnum; i++) {
+   for (i = 0; i < shnum; i++) {
struct section *sec = [i];
long nsyms;
char *strtab;
@@ -366,27 +368,41 @@ static void read_ehdr(FILE *fp)
ehdr.e_shnum = elf_half_to_cpu(ehdr.e_shnum);
ehdr.e_shstrndx  = elf_half_to_cpu(ehdr.e_shstrndx);
 
-   if ((ehdr.e_type != ET_EXEC) && (ehdr.e_type != ET_DYN)) {
+   shnum = ehdr.e_shnum;
+   shstrndx = ehdr.e_shstrndx;
+
+   if ((ehdr.e_type != ET_EXEC) && (ehdr.e_type != ET_DYN))
die("Unsupported ELF header type\n");
-   }
-   if (ehdr.e_machine != ELF_MACHINE) {
+   if (ehdr.e_machine != ELF_MACHINE)
die("Not for %s\n", ELF_MACHINE_NAME);
-   }
-   if (ehdr.e_version != EV_CURRENT) {
+   if (ehdr.e_version != EV_CURRENT)
die("Unknown ELF version\n");
-   }
-   if (ehdr.e_ehsize != sizeof(Elf_Ehdr)) {
+   if (ehdr.e_ehsize != sizeof(Elf_Ehdr))
die("Bad Elf header size\n");
-   }
-   if (ehdr.e_phentsize != sizeof(Elf_Phdr)) {
+   if (ehdr.e_phentsize != sizeof(Elf_Phdr))
die("Bad program header entry\n");
-   }
-   if (ehdr.e_shentsize != sizeof(Elf_Shdr)) {
+   if (ehdr.e_shentsize != sizeof(Elf_Shdr))
die("Bad section header entry\n");
+
+
+   if (shnum == SHN_UNDEF || shstrndx == SHN_XINDEX) {
+   Elf_Shdr shdr;
+
+   if (fseek(fp, ehdr.e_shoff, SEEK_SET) < 0)
+   die("Seek to %d failed: %s\n", ehdr.e_shoff, 
strerror(errno));
+
+   if (fread(, sizeof(shdr), 1, fp) != 1)
+   die("Cannot read initial ELF section header: %s\n", 
strerror(errno));
+
+   if (shnum == SHN_UNDEF)
+   shnum = elf_xword_to_cpu(shdr.sh_size);
+
+   if (shstrndx == SHN_XINDEX)
+   shstrndx = elf_word_to_cpu(shdr.sh_link);
}
-   if (ehdr.e_shstrndx >= ehdr.e_shnum) {
+
+   if (shstrndx >= shnum)
die("String table index out of bounds\n");
-   }
 }
 
 static void read_shdrs(FILE *fp)
@@ -394,20 +410,20 @@ static void read_shdrs(FILE *fp)
int i;
Elf_Shdr shdr;
 
-   secs = calloc(ehdr.e_shnum, sizeof(struct section));
+   secs = calloc(shnum, sizeof(struct section));
if (!secs) {
die("Unable to allocate %d section headers\n",
-   ehdr.e_shnum);
+   shnum);
}
if (fseek(fp, ehdr.e_shoff, SE

Re: [PATCH v3 2/9] kbuild: Support for Symbols.list creation

2019-04-11 Thread Artem Savkov
On Wed, Apr 10, 2019 at 11:50:51AM -0400, Joe Lawrence wrote:
> -clean: archclean vmlinuxclean
> +klpclean:
> + $(Q) rm -f $(objtree)/Symbols.list

nit: $(SLIST) can be used here.

> +clean: archclean vmlinuxclean klpclean
>  
>  # mrproper - Delete all generated files, including .config
>  #
> diff --git a/samples/livepatch/Makefile b/samples/livepatch/Makefile
> index 2472ce39a18d..8b9b42a258ad 100644
> --- a/samples/livepatch/Makefile
> +++ b/samples/livepatch/Makefile
> @@ -1,3 +1,4 @@
> +LIVEPATCH_livepatch-sample := y
>  obj-$(CONFIG_SAMPLE_LIVEPATCH) += livepatch-sample.o
>  obj-$(CONFIG_SAMPLE_LIVEPATCH) += livepatch-shadow-mod.o
>  obj-$(CONFIG_SAMPLE_LIVEPATCH) += livepatch-shadow-fix1.o
> diff --git a/scripts/Makefile.build b/scripts/Makefile.build
> index 76ca30cc4791..ca76bd2080f0 100644
> --- a/scripts/Makefile.build
> +++ b/scripts/Makefile.build
> @@ -246,6 +246,11 @@ cmd_gen_ksymdeps = \
>   $(CONFIG_SHELL) $(srctree)/scripts/gen_ksymdeps.sh $@ >> 
> $(dot-target).cmd
>  endif
>  
> +ifdef CONFIG_LIVEPATCH
> +cmd_livepatch = $(if $(LIVEPATCH_$(basetarget)), \
> + $(shell touch $(MODVERDIR)/$(basetarget).livepatch))
> +endif
> +
>  define rule_cc_o_c
>   $(call cmd,checksrc)
>   $(call cmd_and_fixdep,cc_o_c)
> @@ -280,6 +285,7 @@ $(obj)/%.o: $(src)/%.c $(recordmcount_source) 
> $(objtool_dep) FORCE
>  $(single-used-m): $(obj)/%.o: $(src)/%.c $(recordmcount_source) 
> $(objtool_dep) FORCE
>   $(call cmd,force_checksrc)
>   $(call if_changed_rule,cc_o_c)
> + $(call cmd_livepatch)

nit: maybe use "cmd,livepatch" to be consistent with the other call of
this function.

>   @{ echo $(@:.o=.ko); echo $@; \
>  $(cmd_undef_syms); } > $(MODVERDIR)/$(@F:.o=.mod)
>  
> @@ -456,6 +462,7 @@ cmd_link_multi-m = $(LD) $(ld_flags) -r -o $@ $(filter 
> %.o,$^) $(cmd_secanalysis
>  
>  $(multi-used-m): FORCE
>   $(call if_changed,link_multi-m)
> + $(call cmd,livepatch)
>   @{ echo $(@:.o=.ko); echo $(filter %.o,$^); \
>  $(cmd_undef_syms); } > $(MODVERDIR)/$(@F:.o=.mod)
>  $(call multi_depend, $(multi-used-m), .o, -objs -y -m)
> -- 
> 2.20.1
> 

-- 
 Artem


Re: mm: race in put_and_wait_on_page_locked()

2019-02-05 Thread Artem Savkov
On Mon, Feb 04, 2019 at 12:42:50PM -0800, Hugh Dickins wrote:
> On Mon, 4 Feb 2019, Artem Savkov wrote:
> 
> > Hi Hugh,
> > 
> > Your recent patch 9a1ea439b16b "mm: put_and_wait_on_page_locked() while
> > page is migrated" seems to have introduced a race into page migration
> > process. I have a host that eagerly reproduces the following BUG under
> > stress:
> > 
> > [  302.847402] page:f0021700 count:0 mapcount:0 
> > mapping:c000b2710bb0 index:0x19
> > [  302.848096] xfs_address_space_operations [xfs] 
> > [  302.848100] name:"libc-2.28.so" 
> > [  302.848244] flags: 0x380006(referenced|uptodate)
> > [  302.848521] raw: 00380006 5deadbeef100 5deadbeef200 
> > 
> > [  302.848724] raw: 0019  0001 
> > c000bc0b1000
> > [  302.848919] page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 
> > 0)
> > [  302.849076] page->mem_cgroup:c000bc0b1000
> > [  302.849269] [ cut here ]
> > [  302.849397] kernel BUG at include/linux/mm.h:546!
> > [  302.849586] Oops: Exception in kernel mode, sig: 5 [#1]
> > [  302.849711] LE SMP NR_CPUS=2048 NUMA pSeries
> > [  302.849839] Modules linked in: pseries_rng sunrpc xts vmx_crypto 
> > virtio_balloon xfs libcrc32c virtio_net net_failover virtio_console 
> > failover virtio_blk
> > [  302.850400] CPU: 3 PID: 8759 Comm: cc1 Not tainted 5.0.0-rc4+ #36
> > [  302.850571] NIP:  c039c8b8 LR: c039c8b4 CTR: 
> > c080a0e0
> > [  302.850758] REGS: c000b0d7f7e0 TRAP: 0700   Not tainted  (5.0.0-rc4+)
> > [  302.850952] MSR:  80029033   CR: 48024422  
> > XER: 
> > [  302.851150] CFAR: c03ff584 IRQMASK: 0 
> > [  302.851150] GPR00: c039c8b4 c000b0d7fa70 c1bcca00 
> > 0021 
> > [  302.851150] GPR04: c000b044c628 0007 55a0 
> > c1fc3760 
> > [  302.851150] GPR08: 0007  c000b0d7c000 
> > c000b0d7f5ff 
> > [  302.851150] GPR12: 4400 c0003fffae80  
> >  
> > [  302.851150] GPR16:    
> >  
> > [  302.851150] GPR20: c000689f5aa8 c0002a13ee48  
> > c1da29b0 
> > [  302.851150] GPR24: c1bf7d80 c000689f5a00  
> >  
> > [  302.851150] GPR28: c1bf9e80 c000b0d7fab8 0001 
> > f0021700 
> > [  302.852914] NIP [c039c8b8] 
> > put_and_wait_on_page_locked+0x398/0x3d0
> > [  302.853080] LR [c039c8b4] put_and_wait_on_page_locked+0x394/0x3d0
> > [  302.853235] Call Trace:
> > [  302.853305] [c000b0d7fa70] [c039c8b4] 
> > put_and_wait_on_page_locked+0x394/0x3d0 (unreliable)
> > [  302.853540] [c000b0d7fb10] [c047b838] 
> > __migration_entry_wait+0x178/0x250
> > [  302.853738] [c000b0d7fb50] [c040c928] 
> > do_swap_page+0xd78/0xf60
> > [  302.853997] [c000b0d7fbd0] [c0411078] 
> > __handle_mm_fault+0xbf8/0xe80
> > [  302.854187] [c000b0d7fcb0] [c0411548] 
> > handle_mm_fault+0x248/0x450
> > [  302.854379] [c000b0d7fd00] [c0078ca4] 
> > __do_page_fault+0x2d4/0xdf0
> > [  302.854877] [c000b0d7fde0] [c00797f8] do_page_fault+0x38/0xf0
> > [  302.855057] [c000b0d7fe20] [c000a7c4] 
> > handle_page_fault+0x18/0x38
> > [  302.855300] Instruction dump:
> > [  302.855432] 4bfffcf0 6000 3948 4bfffd20 6000 6000 
> > 3c82ff36 7fe3fb78 
> > [  302.855689] fb210068 38843b78 48062f09 6000 <0fe0> 6000 
> > 3b41 3b61 
> > [  302.855950] ---[ end trace a52140e0f9751ae0 ]---
> > 
> > What seems to be happening is migrate_page_move_mapping() calling
> > page_ref_freeze() on another cpu somewhere between __migration_entry_wait()
> > taking a reference and wait_on_page_bit_common() calling page_put().
> 
> Thank you for reporting, Artem.
> 
> And see the mm thread https://marc.info/?l=linux-mm=154821775401218=2

Ah, thank you. Should have searched through linux-mm, not just lkml.

> That was on arm64, you are on power I think: both point towards xfs
> (Cai could not reproduce it on ext4), but that should not be taken too
> seriously - it could just be easier to reproduce on one than the other.
> 
> Your description in your last paragraph is what I imagined 

mm: race in put_and_wait_on_page_locked()

2019-02-04 Thread Artem Savkov
Hi Hugh,

Your recent patch 9a1ea439b16b "mm: put_and_wait_on_page_locked() while
page is migrated" seems to have introduced a race into page migration
process. I have a host that eagerly reproduces the following BUG under
stress:

[  302.847402] page:f0021700 count:0 mapcount:0 
mapping:c000b2710bb0 index:0x19
[  302.848096] xfs_address_space_operations [xfs] 
[  302.848100] name:"libc-2.28.so" 
[  302.848244] flags: 0x380006(referenced|uptodate)
[  302.848521] raw: 00380006 5deadbeef100 5deadbeef200 

[  302.848724] raw: 0019  0001 
c000bc0b1000
[  302.848919] page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
[  302.849076] page->mem_cgroup:c000bc0b1000
[  302.849269] [ cut here ]
[  302.849397] kernel BUG at include/linux/mm.h:546!
[  302.849586] Oops: Exception in kernel mode, sig: 5 [#1]
[  302.849711] LE SMP NR_CPUS=2048 NUMA pSeries
[  302.849839] Modules linked in: pseries_rng sunrpc xts vmx_crypto 
virtio_balloon xfs libcrc32c virtio_net net_failover virtio_console failover 
virtio_blk
[  302.850400] CPU: 3 PID: 8759 Comm: cc1 Not tainted 5.0.0-rc4+ #36
[  302.850571] NIP:  c039c8b8 LR: c039c8b4 CTR: c080a0e0
[  302.850758] REGS: c000b0d7f7e0 TRAP: 0700   Not tainted  (5.0.0-rc4+)
[  302.850952] MSR:  80029033   CR: 48024422  
XER: 
[  302.851150] CFAR: c03ff584 IRQMASK: 0 
[  302.851150] GPR00: c039c8b4 c000b0d7fa70 c1bcca00 
0021 
[  302.851150] GPR04: c000b044c628 0007 55a0 
c1fc3760 
[  302.851150] GPR08: 0007  c000b0d7c000 
c000b0d7f5ff 
[  302.851150] GPR12: 4400 c0003fffae80  
 
[  302.851150] GPR16:    
 
[  302.851150] GPR20: c000689f5aa8 c0002a13ee48  
c1da29b0 
[  302.851150] GPR24: c1bf7d80 c000689f5a00  
 
[  302.851150] GPR28: c1bf9e80 c000b0d7fab8 0001 
f0021700 
[  302.852914] NIP [c039c8b8] put_and_wait_on_page_locked+0x398/0x3d0
[  302.853080] LR [c039c8b4] put_and_wait_on_page_locked+0x394/0x3d0
[  302.853235] Call Trace:
[  302.853305] [c000b0d7fa70] [c039c8b4] 
put_and_wait_on_page_locked+0x394/0x3d0 (unreliable)
[  302.853540] [c000b0d7fb10] [c047b838] 
__migration_entry_wait+0x178/0x250
[  302.853738] [c000b0d7fb50] [c040c928] do_swap_page+0xd78/0xf60
[  302.853997] [c000b0d7fbd0] [c0411078] 
__handle_mm_fault+0xbf8/0xe80
[  302.854187] [c000b0d7fcb0] [c0411548] handle_mm_fault+0x248/0x450
[  302.854379] [c000b0d7fd00] [c0078ca4] __do_page_fault+0x2d4/0xdf0
[  302.854877] [c000b0d7fde0] [c00797f8] do_page_fault+0x38/0xf0
[  302.855057] [c000b0d7fe20] [c000a7c4] handle_page_fault+0x18/0x38
[  302.855300] Instruction dump:
[  302.855432] 4bfffcf0 6000 3948 4bfffd20 6000 6000 3c82ff36 
7fe3fb78 
[  302.855689] fb210068 38843b78 48062f09 6000 <0fe0> 6000 3b41 
3b61 
[  302.855950] ---[ end trace a52140e0f9751ae0 ]---

What seems to be happening is migrate_page_move_mapping() calling
page_ref_freeze() on another cpu somewhere between __migration_entry_wait()
taking a reference and wait_on_page_bit_common() calling page_put().

-- 
 Artem


[PATCH v2] x86/tools/relocs: fix big section header tables

2018-11-29 Thread Artem Savkov
In case when the number of entries in the section header table is larger
then or equal to SHN_LORESERVE the size of the table is held in the sh_size
member of the initial entry in section header table instead of e_shnum.
Same with the string table index which is located in sh_link instead of
e_shstrndx.

This case is easily reproducible with KCFLAGS="-ffunction-sections",
bzImage build fails with "String table index out of bounds" error.

Signed-off-by: Artem Savkov 
---
 arch/x86/tools/relocs.c | 58 +
 1 file changed, 41 insertions(+), 17 deletions(-)

diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index b629f6992d9f..1f5dcec15d4e 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -11,7 +11,9 @@
 #define Elf_Shdr   ElfW(Shdr)
 #define Elf_SymElfW(Sym)
 
-static Elf_Ehdr ehdr;
+static Elf_Ehdrehdr;
+static unsigned long   shnum;
+static unsigned intshstrndx;
 
 struct relocs {
uint32_t*offset;
@@ -241,9 +243,9 @@ static const char *sec_name(unsigned shndx)
 {
const char *sec_strtab;
const char *name;
-   sec_strtab = secs[ehdr.e_shstrndx].strtab;
+   sec_strtab = secs[shstrndx].strtab;
name = "";
-   if (shndx < ehdr.e_shnum) {
+   if (shndx < shnum) {
name = sec_strtab + secs[shndx].shdr.sh_name;
}
else if (shndx == SHN_ABS) {
@@ -271,7 +273,7 @@ static const char *sym_name(const char *sym_strtab, Elf_Sym 
*sym)
 static Elf_Sym *sym_lookup(const char *symname)
 {
int i;
-   for (i = 0; i < ehdr.e_shnum; i++) {
+   for (i = 0; i < shnum; i++) {
struct section *sec = [i];
long nsyms;
char *strtab;
@@ -366,6 +368,9 @@ static void read_ehdr(FILE *fp)
ehdr.e_shnum = elf_half_to_cpu(ehdr.e_shnum);
ehdr.e_shstrndx  = elf_half_to_cpu(ehdr.e_shstrndx);
 
+   shnum = ehdr.e_shnum;
+   shstrndx = ehdr.e_shstrndx;
+
if ((ehdr.e_type != ET_EXEC) && (ehdr.e_type != ET_DYN)) {
die("Unsupported ELF header type\n");
}
@@ -384,7 +389,26 @@ static void read_ehdr(FILE *fp)
if (ehdr.e_shentsize != sizeof(Elf_Shdr)) {
die("Bad section header entry\n");
}
-   if (ehdr.e_shstrndx >= ehdr.e_shnum) {
+
+   if (shnum == SHN_UNDEF || shstrndx == SHN_XINDEX) {
+   Elf_Shdr shdr;
+
+   if (fseek(fp, ehdr.e_shoff, SEEK_SET) < 0)
+   die("Seek to %d failed: %s\n",
+   ehdr.e_shoff, strerror(errno));
+
+   if (fread(, sizeof(shdr), 1, fp) != 1)
+   die("Cannot read initial ELF section header: %s\n",
+   strerror(errno));
+
+   if (shnum == SHN_UNDEF)
+   shnum = elf_xword_to_cpu(shdr.sh_size);
+
+   if (shstrndx == SHN_XINDEX)
+   shstrndx = elf_word_to_cpu(shdr.sh_link);
+   }
+
+   if (shstrndx >= shnum) {
die("String table index out of bounds\n");
}
 }
@@ -394,20 +418,20 @@ static void read_shdrs(FILE *fp)
int i;
Elf_Shdr shdr;
 
-   secs = calloc(ehdr.e_shnum, sizeof(struct section));
+   secs = calloc(shnum, sizeof(struct section));
if (!secs) {
die("Unable to allocate %d section headers\n",
-   ehdr.e_shnum);
+   shnum);
}
if (fseek(fp, ehdr.e_shoff, SEEK_SET) < 0) {
die("Seek to %d failed: %s\n",
ehdr.e_shoff, strerror(errno));
}
-   for (i = 0; i < ehdr.e_shnum; i++) {
+   for (i = 0; i < shnum; i++) {
struct section *sec = [i];
if (fread(, sizeof(shdr), 1, fp) != 1)
die("Cannot read ELF section headers %d/%d: %s\n",
-   i, ehdr.e_shnum, strerror(errno));
+   i, shnum, strerror(errno));
sec->shdr.sh_name  = elf_word_to_cpu(shdr.sh_name);
sec->shdr.sh_type  = elf_word_to_cpu(shdr.sh_type);
sec->shdr.sh_flags = elf_xword_to_cpu(shdr.sh_flags);
@@ -418,7 +442,7 @@ static void read_shdrs(FILE *fp)
sec->shdr.sh_info  = elf_word_to_cpu(shdr.sh_info);
sec->shdr.sh_addralign = elf_xword_to_cpu(shdr.sh_addralign);
sec->shdr.sh_entsize   = elf_xword_to_cpu(shdr.sh_entsize);
-   if (sec->shdr.sh_link < ehdr.e_shnum)
+   if (sec->shdr.sh_link < shnum)
sec->link = [sec->shdr.sh_link];
}
 
@@ -427,7 +451,7 @@ static void read_shdrs(FIL

[PATCH v2] x86/tools/relocs: fix big section header tables

2018-11-29 Thread Artem Savkov
In case when the number of entries in the section header table is larger
then or equal to SHN_LORESERVE the size of the table is held in the sh_size
member of the initial entry in section header table instead of e_shnum.
Same with the string table index which is located in sh_link instead of
e_shstrndx.

This case is easily reproducible with KCFLAGS="-ffunction-sections",
bzImage build fails with "String table index out of bounds" error.

Signed-off-by: Artem Savkov 
---
 arch/x86/tools/relocs.c | 58 +
 1 file changed, 41 insertions(+), 17 deletions(-)

diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index b629f6992d9f..1f5dcec15d4e 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -11,7 +11,9 @@
 #define Elf_Shdr   ElfW(Shdr)
 #define Elf_SymElfW(Sym)
 
-static Elf_Ehdr ehdr;
+static Elf_Ehdrehdr;
+static unsigned long   shnum;
+static unsigned intshstrndx;
 
 struct relocs {
uint32_t*offset;
@@ -241,9 +243,9 @@ static const char *sec_name(unsigned shndx)
 {
const char *sec_strtab;
const char *name;
-   sec_strtab = secs[ehdr.e_shstrndx].strtab;
+   sec_strtab = secs[shstrndx].strtab;
name = "";
-   if (shndx < ehdr.e_shnum) {
+   if (shndx < shnum) {
name = sec_strtab + secs[shndx].shdr.sh_name;
}
else if (shndx == SHN_ABS) {
@@ -271,7 +273,7 @@ static const char *sym_name(const char *sym_strtab, Elf_Sym 
*sym)
 static Elf_Sym *sym_lookup(const char *symname)
 {
int i;
-   for (i = 0; i < ehdr.e_shnum; i++) {
+   for (i = 0; i < shnum; i++) {
struct section *sec = [i];
long nsyms;
char *strtab;
@@ -366,6 +368,9 @@ static void read_ehdr(FILE *fp)
ehdr.e_shnum = elf_half_to_cpu(ehdr.e_shnum);
ehdr.e_shstrndx  = elf_half_to_cpu(ehdr.e_shstrndx);
 
+   shnum = ehdr.e_shnum;
+   shstrndx = ehdr.e_shstrndx;
+
if ((ehdr.e_type != ET_EXEC) && (ehdr.e_type != ET_DYN)) {
die("Unsupported ELF header type\n");
}
@@ -384,7 +389,26 @@ static void read_ehdr(FILE *fp)
if (ehdr.e_shentsize != sizeof(Elf_Shdr)) {
die("Bad section header entry\n");
}
-   if (ehdr.e_shstrndx >= ehdr.e_shnum) {
+
+   if (shnum == SHN_UNDEF || shstrndx == SHN_XINDEX) {
+   Elf_Shdr shdr;
+
+   if (fseek(fp, ehdr.e_shoff, SEEK_SET) < 0)
+   die("Seek to %d failed: %s\n",
+   ehdr.e_shoff, strerror(errno));
+
+   if (fread(, sizeof(shdr), 1, fp) != 1)
+   die("Cannot read initial ELF section header: %s\n",
+   strerror(errno));
+
+   if (shnum == SHN_UNDEF)
+   shnum = elf_xword_to_cpu(shdr.sh_size);
+
+   if (shstrndx == SHN_XINDEX)
+   shstrndx = elf_word_to_cpu(shdr.sh_link);
+   }
+
+   if (shstrndx >= shnum) {
die("String table index out of bounds\n");
}
 }
@@ -394,20 +418,20 @@ static void read_shdrs(FILE *fp)
int i;
Elf_Shdr shdr;
 
-   secs = calloc(ehdr.e_shnum, sizeof(struct section));
+   secs = calloc(shnum, sizeof(struct section));
if (!secs) {
die("Unable to allocate %d section headers\n",
-   ehdr.e_shnum);
+   shnum);
}
if (fseek(fp, ehdr.e_shoff, SEEK_SET) < 0) {
die("Seek to %d failed: %s\n",
ehdr.e_shoff, strerror(errno));
}
-   for (i = 0; i < ehdr.e_shnum; i++) {
+   for (i = 0; i < shnum; i++) {
struct section *sec = [i];
if (fread(, sizeof(shdr), 1, fp) != 1)
die("Cannot read ELF section headers %d/%d: %s\n",
-   i, ehdr.e_shnum, strerror(errno));
+   i, shnum, strerror(errno));
sec->shdr.sh_name  = elf_word_to_cpu(shdr.sh_name);
sec->shdr.sh_type  = elf_word_to_cpu(shdr.sh_type);
sec->shdr.sh_flags = elf_xword_to_cpu(shdr.sh_flags);
@@ -418,7 +442,7 @@ static void read_shdrs(FILE *fp)
sec->shdr.sh_info  = elf_word_to_cpu(shdr.sh_info);
sec->shdr.sh_addralign = elf_xword_to_cpu(shdr.sh_addralign);
sec->shdr.sh_entsize   = elf_xword_to_cpu(shdr.sh_entsize);
-   if (sec->shdr.sh_link < ehdr.e_shnum)
+   if (sec->shdr.sh_link < shnum)
sec->link = [sec->shdr.sh_link];
}
 
@@ -427,7 +451,7 @@ static void read_shdrs(FIL

Re: [PATCH] x86/tools/relocs: fix big section header tables

2018-11-29 Thread Artem Savkov
On Thu, Nov 29, 2018 at 08:23:12AM -0600, Josh Poimboeuf wrote:
> On Thu, Nov 29, 2018 at 02:51:33PM +0100, Artem Savkov wrote:
> > In case when the number of entries in the section header table is larger
> > then or equal to SHN_LORESERVE the size of the table is held in the sh_size
> > member of the initial entry in section header table instead of e_shnum.
> > Same with the string table index which is located in sh_link instead of
> > e_shstrndx.
> > 
> > This case is easily reproducible with KCFLAGS="-ffunction-sections",
> > bzImage build fails with "String table index out of bounds" error.
> > 
> > Signed-off-by: Artem Savkov 
> > ---
> >  arch/x86/tools/relocs.c | 58 +
> >  1 file changed, 41 insertions(+), 17 deletions(-)
> > 
> > diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
> > index b629f6992d9f..5275ea0a0d13 100644
> > --- a/arch/x86/tools/relocs.c
> > +++ b/arch/x86/tools/relocs.c
> > @@ -11,7 +11,9 @@
> >  #define Elf_Shdr   ElfW(Shdr)
> >  #define Elf_SymElfW(Sym)
> >  
> > -static Elf_Ehdr ehdr;
> > +static Elf_Ehdrehdr;
> 
> I think there's a tab missing here, it doesn't line up with the other
> variables.

This seems to be a vim bug. It aligns perfectly in
cat/less/lore.kernel.org which all seem to use tabstop=8 by default, but
it does not align in vim, but it does align with tabstop=7 in vim.


-- 
 Artem


Re: [PATCH] x86/tools/relocs: fix big section header tables

2018-11-29 Thread Artem Savkov
On Thu, Nov 29, 2018 at 08:23:12AM -0600, Josh Poimboeuf wrote:
> On Thu, Nov 29, 2018 at 02:51:33PM +0100, Artem Savkov wrote:
> > In case when the number of entries in the section header table is larger
> > then or equal to SHN_LORESERVE the size of the table is held in the sh_size
> > member of the initial entry in section header table instead of e_shnum.
> > Same with the string table index which is located in sh_link instead of
> > e_shstrndx.
> > 
> > This case is easily reproducible with KCFLAGS="-ffunction-sections",
> > bzImage build fails with "String table index out of bounds" error.
> > 
> > Signed-off-by: Artem Savkov 
> > ---
> >  arch/x86/tools/relocs.c | 58 +
> >  1 file changed, 41 insertions(+), 17 deletions(-)
> > 
> > diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
> > index b629f6992d9f..5275ea0a0d13 100644
> > --- a/arch/x86/tools/relocs.c
> > +++ b/arch/x86/tools/relocs.c
> > @@ -11,7 +11,9 @@
> >  #define Elf_Shdr   ElfW(Shdr)
> >  #define Elf_SymElfW(Sym)
> >  
> > -static Elf_Ehdr ehdr;
> > +static Elf_Ehdrehdr;
> 
> I think there's a tab missing here, it doesn't line up with the other
> variables.

This seems to be a vim bug. It aligns perfectly in
cat/less/lore.kernel.org which all seem to use tabstop=8 by default, but
it does not align in vim, but it does align with tabstop=7 in vim.


-- 
 Artem


[PATCH] x86/tools/relocs: fix big section header tables

2018-11-29 Thread Artem Savkov
In case when the number of entries in the section header table is larger
then or equal to SHN_LORESERVE the size of the table is held in the sh_size
member of the initial entry in section header table instead of e_shnum.
Same with the string table index which is located in sh_link instead of
e_shstrndx.

This case is easily reproducible with KCFLAGS="-ffunction-sections",
bzImage build fails with "String table index out of bounds" error.

Signed-off-by: Artem Savkov 
---
 arch/x86/tools/relocs.c | 58 +
 1 file changed, 41 insertions(+), 17 deletions(-)

diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index b629f6992d9f..5275ea0a0d13 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -11,7 +11,9 @@
 #define Elf_Shdr   ElfW(Shdr)
 #define Elf_SymElfW(Sym)
 
-static Elf_Ehdr ehdr;
+static Elf_Ehdrehdr;
+static unsigned long   shnum;
+static unsigned intshstrndx;
 
 struct relocs {
uint32_t*offset;
@@ -241,9 +243,9 @@ static const char *sec_name(unsigned shndx)
 {
const char *sec_strtab;
const char *name;
-   sec_strtab = secs[ehdr.e_shstrndx].strtab;
+   sec_strtab = secs[shstrndx].strtab;
name = "";
-   if (shndx < ehdr.e_shnum) {
+   if (shndx < shnum) {
name = sec_strtab + secs[shndx].shdr.sh_name;
}
else if (shndx == SHN_ABS) {
@@ -271,7 +273,7 @@ static const char *sym_name(const char *sym_strtab, Elf_Sym 
*sym)
 static Elf_Sym *sym_lookup(const char *symname)
 {
int i;
-   for (i = 0; i < ehdr.e_shnum; i++) {
+   for (i = 0; i < shnum; i++) {
struct section *sec = [i];
long nsyms;
char *strtab;
@@ -366,6 +368,9 @@ static void read_ehdr(FILE *fp)
ehdr.e_shnum = elf_half_to_cpu(ehdr.e_shnum);
ehdr.e_shstrndx  = elf_half_to_cpu(ehdr.e_shstrndx);
 
+   shnum = ehdr.e_shnum;
+   shstrndx = ehdr.e_shstrndx;
+
if ((ehdr.e_type != ET_EXEC) && (ehdr.e_type != ET_DYN)) {
die("Unsupported ELF header type\n");
}
@@ -384,7 +389,26 @@ static void read_ehdr(FILE *fp)
if (ehdr.e_shentsize != sizeof(Elf_Shdr)) {
die("Bad section header entry\n");
}
-   if (ehdr.e_shstrndx >= ehdr.e_shnum) {
+
+   if (shnum == SHN_UNDEF || shstrndx == SHN_XINDEX) {
+   Elf_Shdr shdr;
+
+   if (fseek(fp, ehdr.e_shoff, SEEK_SET) < 0)
+   die("Seek to %d failed: %s\n",
+   ehdr.e_shoff, strerror(errno));
+
+   if (fread(, sizeof(shdr), 1, fp) != 1)
+   die("Cannot read initial ELF section header: %s\n",
+   strerror(errno));
+
+   if (shnum == SHN_UNDEF)
+   shnum = elf_xword_to_cpu(shdr.sh_size);
+
+   if (shstrndx == SHN_XINDEX)
+   shstrndx = elf_word_to_cpu(shdr.sh_link);
+   }
+
+   if (shstrndx >= shnum) {
die("String table index out of bounds\n");
}
 }
@@ -394,20 +418,20 @@ static void read_shdrs(FILE *fp)
int i;
Elf_Shdr shdr;
 
-   secs = calloc(ehdr.e_shnum, sizeof(struct section));
+   secs = calloc(shnum, sizeof(struct section));
if (!secs) {
die("Unable to allocate %d section headers\n",
-   ehdr.e_shnum);
+   shnum);
}
if (fseek(fp, ehdr.e_shoff, SEEK_SET) < 0) {
die("Seek to %d failed: %s\n",
ehdr.e_shoff, strerror(errno));
}
-   for (i = 0; i < ehdr.e_shnum; i++) {
+   for (i = 0; i < shnum; i++) {
struct section *sec = [i];
if (fread(, sizeof(shdr), 1, fp) != 1)
die("Cannot read ELF section headers %d/%d: %s\n",
-   i, ehdr.e_shnum, strerror(errno));
+   i, shnum, strerror(errno));
sec->shdr.sh_name  = elf_word_to_cpu(shdr.sh_name);
sec->shdr.sh_type  = elf_word_to_cpu(shdr.sh_type);
sec->shdr.sh_flags = elf_xword_to_cpu(shdr.sh_flags);
@@ -418,7 +442,7 @@ static void read_shdrs(FILE *fp)
sec->shdr.sh_info  = elf_word_to_cpu(shdr.sh_info);
sec->shdr.sh_addralign = elf_xword_to_cpu(shdr.sh_addralign);
sec->shdr.sh_entsize   = elf_xword_to_cpu(shdr.sh_entsize);
-   if (sec->shdr.sh_link < ehdr.e_shnum)
+   if (sec->shdr.sh_link < shnum)
sec->link = [sec->shdr.sh_link];
}
 
@@ -427,7 +451,7 @@ static void read_shdrs(FIL

[PATCH] x86/tools/relocs: fix big section header tables

2018-11-29 Thread Artem Savkov
In case when the number of entries in the section header table is larger
then or equal to SHN_LORESERVE the size of the table is held in the sh_size
member of the initial entry in section header table instead of e_shnum.
Same with the string table index which is located in sh_link instead of
e_shstrndx.

This case is easily reproducible with KCFLAGS="-ffunction-sections",
bzImage build fails with "String table index out of bounds" error.

Signed-off-by: Artem Savkov 
---
 arch/x86/tools/relocs.c | 58 +
 1 file changed, 41 insertions(+), 17 deletions(-)

diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index b629f6992d9f..5275ea0a0d13 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -11,7 +11,9 @@
 #define Elf_Shdr   ElfW(Shdr)
 #define Elf_SymElfW(Sym)
 
-static Elf_Ehdr ehdr;
+static Elf_Ehdrehdr;
+static unsigned long   shnum;
+static unsigned intshstrndx;
 
 struct relocs {
uint32_t*offset;
@@ -241,9 +243,9 @@ static const char *sec_name(unsigned shndx)
 {
const char *sec_strtab;
const char *name;
-   sec_strtab = secs[ehdr.e_shstrndx].strtab;
+   sec_strtab = secs[shstrndx].strtab;
name = "";
-   if (shndx < ehdr.e_shnum) {
+   if (shndx < shnum) {
name = sec_strtab + secs[shndx].shdr.sh_name;
}
else if (shndx == SHN_ABS) {
@@ -271,7 +273,7 @@ static const char *sym_name(const char *sym_strtab, Elf_Sym 
*sym)
 static Elf_Sym *sym_lookup(const char *symname)
 {
int i;
-   for (i = 0; i < ehdr.e_shnum; i++) {
+   for (i = 0; i < shnum; i++) {
struct section *sec = [i];
long nsyms;
char *strtab;
@@ -366,6 +368,9 @@ static void read_ehdr(FILE *fp)
ehdr.e_shnum = elf_half_to_cpu(ehdr.e_shnum);
ehdr.e_shstrndx  = elf_half_to_cpu(ehdr.e_shstrndx);
 
+   shnum = ehdr.e_shnum;
+   shstrndx = ehdr.e_shstrndx;
+
if ((ehdr.e_type != ET_EXEC) && (ehdr.e_type != ET_DYN)) {
die("Unsupported ELF header type\n");
}
@@ -384,7 +389,26 @@ static void read_ehdr(FILE *fp)
if (ehdr.e_shentsize != sizeof(Elf_Shdr)) {
die("Bad section header entry\n");
}
-   if (ehdr.e_shstrndx >= ehdr.e_shnum) {
+
+   if (shnum == SHN_UNDEF || shstrndx == SHN_XINDEX) {
+   Elf_Shdr shdr;
+
+   if (fseek(fp, ehdr.e_shoff, SEEK_SET) < 0)
+   die("Seek to %d failed: %s\n",
+   ehdr.e_shoff, strerror(errno));
+
+   if (fread(, sizeof(shdr), 1, fp) != 1)
+   die("Cannot read initial ELF section header: %s\n",
+   strerror(errno));
+
+   if (shnum == SHN_UNDEF)
+   shnum = elf_xword_to_cpu(shdr.sh_size);
+
+   if (shstrndx == SHN_XINDEX)
+   shstrndx = elf_word_to_cpu(shdr.sh_link);
+   }
+
+   if (shstrndx >= shnum) {
die("String table index out of bounds\n");
}
 }
@@ -394,20 +418,20 @@ static void read_shdrs(FILE *fp)
int i;
Elf_Shdr shdr;
 
-   secs = calloc(ehdr.e_shnum, sizeof(struct section));
+   secs = calloc(shnum, sizeof(struct section));
if (!secs) {
die("Unable to allocate %d section headers\n",
-   ehdr.e_shnum);
+   shnum);
}
if (fseek(fp, ehdr.e_shoff, SEEK_SET) < 0) {
die("Seek to %d failed: %s\n",
ehdr.e_shoff, strerror(errno));
}
-   for (i = 0; i < ehdr.e_shnum; i++) {
+   for (i = 0; i < shnum; i++) {
struct section *sec = [i];
if (fread(, sizeof(shdr), 1, fp) != 1)
die("Cannot read ELF section headers %d/%d: %s\n",
-   i, ehdr.e_shnum, strerror(errno));
+   i, shnum, strerror(errno));
sec->shdr.sh_name  = elf_word_to_cpu(shdr.sh_name);
sec->shdr.sh_type  = elf_word_to_cpu(shdr.sh_type);
sec->shdr.sh_flags = elf_xword_to_cpu(shdr.sh_flags);
@@ -418,7 +442,7 @@ static void read_shdrs(FILE *fp)
sec->shdr.sh_info  = elf_word_to_cpu(shdr.sh_info);
sec->shdr.sh_addralign = elf_xword_to_cpu(shdr.sh_addralign);
sec->shdr.sh_entsize   = elf_xword_to_cpu(shdr.sh_entsize);
-   if (sec->shdr.sh_link < ehdr.e_shnum)
+   if (sec->shdr.sh_link < shnum)
sec->link = [sec->shdr.sh_link];
}
 
@@ -427,7 +451,7 @@ static void read_shdrs(FIL

[tip:core/urgent] objtool: Fix segfault in .cold detection with -ffunction-sections

2018-11-20 Thread tip-bot for Artem Savkov
Commit-ID:  22566c1603030f0a036ad564634b064ad1a55db2
Gitweb: https://git.kernel.org/tip/22566c1603030f0a036ad564634b064ad1a55db2
Author: Artem Savkov 
AuthorDate: Tue, 20 Nov 2018 11:52:16 -0600
Committer:  Ingo Molnar 
CommitDate: Tue, 20 Nov 2018 18:59:00 +0100

objtool: Fix segfault in .cold detection with -ffunction-sections

Because find_symbol_by_name() traverses the same lists as
read_symbols(), changing sym->name in place without copying it affects
the result of find_symbol_by_name().  In the case where a ".cold"
function precedes its parent in sec->symbol_list, it can result in a
function being considered a parent of itself. This leads to function
length being set to 0 and other consequent side-effects including a
segfault in add_switch_table().  The effects of this bug are only
visible when building with -ffunction-sections in KCFLAGS.

Fix by copying the search string instead of modifying it in place.

Signed-off-by: Artem Savkov 
Signed-off-by: Josh Poimboeuf 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Fixes: 13810435b9a7 ("objtool: Support GCC 8's cold subfunctions")
Link: 
http://lkml.kernel.org/r/910abd6b5a4945130fd44f787c24e07b9e07c8da.1542736240.git.jpoim...@redhat.com
Signed-off-by: Ingo Molnar 
---
 tools/objtool/elf.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
index e7a7ac40e045..b8f3cca8e58b 100644
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -31,6 +31,8 @@
 #include "elf.h"
 #include "warn.h"
 
+#define MAX_NAME_LEN 128
+
 struct section *find_section_by_name(struct elf *elf, const char *name)
 {
struct section *sec;
@@ -298,6 +300,8 @@ static int read_symbols(struct elf *elf)
/* Create parent/child links for any cold subfunctions */
list_for_each_entry(sec, >sections, list) {
list_for_each_entry(sym, >symbol_list, list) {
+   char pname[MAX_NAME_LEN + 1];
+   size_t pnamelen;
if (sym->type != STT_FUNC)
continue;
sym->pfunc = sym->cfunc = sym;
@@ -305,9 +309,16 @@ static int read_symbols(struct elf *elf)
if (!coldstr)
continue;
 
-   coldstr[0] = '\0';
-   pfunc = find_symbol_by_name(elf, sym->name);
-   coldstr[0] = '.';
+   pnamelen = coldstr - sym->name;
+   if (pnamelen > MAX_NAME_LEN) {
+   WARN("%s(): parent function name exceeds 
maximum length of %d characters",
+sym->name, MAX_NAME_LEN);
+   return -1;
+   }
+
+   strncpy(pname, sym->name, pnamelen);
+   pname[pnamelen] = '\0';
+   pfunc = find_symbol_by_name(elf, pname);
 
if (!pfunc) {
WARN("%s(): can't find parent function",


[tip:core/urgent] objtool: Fix segfault in .cold detection with -ffunction-sections

2018-11-20 Thread tip-bot for Artem Savkov
Commit-ID:  22566c1603030f0a036ad564634b064ad1a55db2
Gitweb: https://git.kernel.org/tip/22566c1603030f0a036ad564634b064ad1a55db2
Author: Artem Savkov 
AuthorDate: Tue, 20 Nov 2018 11:52:16 -0600
Committer:  Ingo Molnar 
CommitDate: Tue, 20 Nov 2018 18:59:00 +0100

objtool: Fix segfault in .cold detection with -ffunction-sections

Because find_symbol_by_name() traverses the same lists as
read_symbols(), changing sym->name in place without copying it affects
the result of find_symbol_by_name().  In the case where a ".cold"
function precedes its parent in sec->symbol_list, it can result in a
function being considered a parent of itself. This leads to function
length being set to 0 and other consequent side-effects including a
segfault in add_switch_table().  The effects of this bug are only
visible when building with -ffunction-sections in KCFLAGS.

Fix by copying the search string instead of modifying it in place.

Signed-off-by: Artem Savkov 
Signed-off-by: Josh Poimboeuf 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Fixes: 13810435b9a7 ("objtool: Support GCC 8's cold subfunctions")
Link: 
http://lkml.kernel.org/r/910abd6b5a4945130fd44f787c24e07b9e07c8da.1542736240.git.jpoim...@redhat.com
Signed-off-by: Ingo Molnar 
---
 tools/objtool/elf.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
index e7a7ac40e045..b8f3cca8e58b 100644
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -31,6 +31,8 @@
 #include "elf.h"
 #include "warn.h"
 
+#define MAX_NAME_LEN 128
+
 struct section *find_section_by_name(struct elf *elf, const char *name)
 {
struct section *sec;
@@ -298,6 +300,8 @@ static int read_symbols(struct elf *elf)
/* Create parent/child links for any cold subfunctions */
list_for_each_entry(sec, >sections, list) {
list_for_each_entry(sym, >symbol_list, list) {
+   char pname[MAX_NAME_LEN + 1];
+   size_t pnamelen;
if (sym->type != STT_FUNC)
continue;
sym->pfunc = sym->cfunc = sym;
@@ -305,9 +309,16 @@ static int read_symbols(struct elf *elf)
if (!coldstr)
continue;
 
-   coldstr[0] = '\0';
-   pfunc = find_symbol_by_name(elf, sym->name);
-   coldstr[0] = '.';
+   pnamelen = coldstr - sym->name;
+   if (pnamelen > MAX_NAME_LEN) {
+   WARN("%s(): parent function name exceeds 
maximum length of %d characters",
+sym->name, MAX_NAME_LEN);
+   return -1;
+   }
+
+   strncpy(pname, sym->name, pnamelen);
+   pname[pnamelen] = '\0';
+   pfunc = find_symbol_by_name(elf, pname);
 
if (!pfunc) {
WARN("%s(): can't find parent function",


[tip:core/urgent] objtool: Fix double-free in .cold detection error path

2018-11-20 Thread tip-bot for Artem Savkov
Commit-ID:  0b9301fb632f7111a3293a30cc5b20f1b82ed08d
Gitweb: https://git.kernel.org/tip/0b9301fb632f7111a3293a30cc5b20f1b82ed08d
Author: Artem Savkov 
AuthorDate: Tue, 20 Nov 2018 11:52:15 -0600
Committer:  Ingo Molnar 
CommitDate: Tue, 20 Nov 2018 18:59:00 +0100

objtool: Fix double-free in .cold detection error path

If read_symbols() fails during second list traversal (the one dealing
with ".cold" subfunctions) it frees the symbol, but never deletes it
from the list/hash_table resulting in symbol being freed again in
elf_close(). Fix it by just returning an error, leaving cleanup to
elf_close().

Signed-off-by: Artem Savkov 
Signed-off-by: Josh Poimboeuf 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Fixes: 13810435b9a7 ("objtool: Support GCC 8's cold subfunctions")
Link: 
http://lkml.kernel.org/r/beac5a9b7da9e8be90223459dcbe07766ae437dd.1542736240.git.jpoim...@redhat.com
Signed-off-by: Ingo Molnar 
---
 tools/objtool/elf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
index 6dbb9fae0f9d..e7a7ac40e045 100644
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -312,7 +312,7 @@ static int read_symbols(struct elf *elf)
if (!pfunc) {
WARN("%s(): can't find parent function",
 sym->name);
-   goto err;
+   return -1;
}
 
sym->pfunc = pfunc;


[tip:core/urgent] objtool: Fix double-free in .cold detection error path

2018-11-20 Thread tip-bot for Artem Savkov
Commit-ID:  0b9301fb632f7111a3293a30cc5b20f1b82ed08d
Gitweb: https://git.kernel.org/tip/0b9301fb632f7111a3293a30cc5b20f1b82ed08d
Author: Artem Savkov 
AuthorDate: Tue, 20 Nov 2018 11:52:15 -0600
Committer:  Ingo Molnar 
CommitDate: Tue, 20 Nov 2018 18:59:00 +0100

objtool: Fix double-free in .cold detection error path

If read_symbols() fails during second list traversal (the one dealing
with ".cold" subfunctions) it frees the symbol, but never deletes it
from the list/hash_table resulting in symbol being freed again in
elf_close(). Fix it by just returning an error, leaving cleanup to
elf_close().

Signed-off-by: Artem Savkov 
Signed-off-by: Josh Poimboeuf 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Fixes: 13810435b9a7 ("objtool: Support GCC 8's cold subfunctions")
Link: 
http://lkml.kernel.org/r/beac5a9b7da9e8be90223459dcbe07766ae437dd.1542736240.git.jpoim...@redhat.com
Signed-off-by: Ingo Molnar 
---
 tools/objtool/elf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
index 6dbb9fae0f9d..e7a7ac40e045 100644
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -312,7 +312,7 @@ static int read_symbols(struct elf *elf)
if (!pfunc) {
WARN("%s(): can't find parent function",
 sym->name);
-   goto err;
+   return -1;
}
 
sym->pfunc = pfunc;


[PATCH v4 2/2] objtool: fix .cold functions parent symbols search

2018-11-20 Thread Artem Savkov
Because find_symbol_by_name() traverses the same lists as read_symbols()
changing sym->name in place without copying it affects the result of
find_symbol_by_name() and, in case when ".cold" function precedes it's
parent in sec->symbol_list, can result in function being considered a
parent of itself. This leads to function length being set to 0 and other
consequent side-effects including a segfault in add_switch_table().
The effects of this bug are only visible when building with
-ffunction-sections in KCFLAGS.

Fix by copying the search string instead of modifying it in place.

Fixes: 13810435b9a7 ("objtool: Support GCC 8's cold subfunctions")
Signed-off-by: Artem Savkov 
---
 tools/objtool/elf.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
index e7a7ac40e045..b8f3cca8e58b 100644
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -31,6 +31,8 @@
 #include "elf.h"
 #include "warn.h"
 
+#define MAX_NAME_LEN 128
+
 struct section *find_section_by_name(struct elf *elf, const char *name)
 {
struct section *sec;
@@ -298,6 +300,8 @@ static int read_symbols(struct elf *elf)
/* Create parent/child links for any cold subfunctions */
list_for_each_entry(sec, >sections, list) {
list_for_each_entry(sym, >symbol_list, list) {
+   char pname[MAX_NAME_LEN + 1];
+   size_t pnamelen;
if (sym->type != STT_FUNC)
continue;
sym->pfunc = sym->cfunc = sym;
@@ -305,9 +309,16 @@ static int read_symbols(struct elf *elf)
if (!coldstr)
continue;
 
-   coldstr[0] = '\0';
-   pfunc = find_symbol_by_name(elf, sym->name);
-   coldstr[0] = '.';
+   pnamelen = coldstr - sym->name;
+   if (pnamelen > MAX_NAME_LEN) {
+   WARN("%s(): parent function name exceeds 
maximum length of %d characters",
+sym->name, MAX_NAME_LEN);
+   return -1;
+   }
+
+   strncpy(pname, sym->name, pnamelen);
+   pname[pnamelen] = '\0';
+   pfunc = find_symbol_by_name(elf, pname);
 
if (!pfunc) {
WARN("%s(): can't find parent function",
-- 
2.19.1



[PATCH v4 0/2] objtool: read_symbols() fixes

2018-11-20 Thread Artem Savkov
The series started with 'parent symbol search' patch, but I found another issue
in read_symbols() while testing the failure-path.

Artem Savkov (2):
  objtool: fix failed cold symbol doublefree
  objtool: fix .cold functions parent symbols search

 tools/objtool/elf.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

-- 
2.19.1



[PATCH v4 1/2] objtool: fix failed cold symbol doublefree

2018-11-20 Thread Artem Savkov
If read_symbols() fails during second list traversal (the one dealing
with ".cold" subfunctions) it frees the symbol, but never deletes it
from the list/hash_table resulting in symbol being freed again in
elf_close(). Fix by just returning an error leaving cleanup to
elf_close().

Fixes: 13810435b9a7 ("objtool: Support GCC 8's cold subfunctions")
Signed-off-by: Artem Savkov 
---
 tools/objtool/elf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
index 6dbb9fae0f9d..e7a7ac40e045 100644
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -312,7 +312,7 @@ static int read_symbols(struct elf *elf)
if (!pfunc) {
WARN("%s(): can't find parent function",
 sym->name);
-   goto err;
+   return -1;
}
 
sym->pfunc = pfunc;
-- 
2.19.1



[PATCH v4 1/2] objtool: fix failed cold symbol doublefree

2018-11-20 Thread Artem Savkov
If read_symbols() fails during second list traversal (the one dealing
with ".cold" subfunctions) it frees the symbol, but never deletes it
from the list/hash_table resulting in symbol being freed again in
elf_close(). Fix by just returning an error leaving cleanup to
elf_close().

Fixes: 13810435b9a7 ("objtool: Support GCC 8's cold subfunctions")
Signed-off-by: Artem Savkov 
---
 tools/objtool/elf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
index 6dbb9fae0f9d..e7a7ac40e045 100644
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -312,7 +312,7 @@ static int read_symbols(struct elf *elf)
if (!pfunc) {
WARN("%s(): can't find parent function",
 sym->name);
-   goto err;
+   return -1;
}
 
sym->pfunc = pfunc;
-- 
2.19.1



[PATCH v4 2/2] objtool: fix .cold functions parent symbols search

2018-11-20 Thread Artem Savkov
Because find_symbol_by_name() traverses the same lists as read_symbols()
changing sym->name in place without copying it affects the result of
find_symbol_by_name() and, in case when ".cold" function precedes it's
parent in sec->symbol_list, can result in function being considered a
parent of itself. This leads to function length being set to 0 and other
consequent side-effects including a segfault in add_switch_table().
The effects of this bug are only visible when building with
-ffunction-sections in KCFLAGS.

Fix by copying the search string instead of modifying it in place.

Fixes: 13810435b9a7 ("objtool: Support GCC 8's cold subfunctions")
Signed-off-by: Artem Savkov 
---
 tools/objtool/elf.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
index e7a7ac40e045..b8f3cca8e58b 100644
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -31,6 +31,8 @@
 #include "elf.h"
 #include "warn.h"
 
+#define MAX_NAME_LEN 128
+
 struct section *find_section_by_name(struct elf *elf, const char *name)
 {
struct section *sec;
@@ -298,6 +300,8 @@ static int read_symbols(struct elf *elf)
/* Create parent/child links for any cold subfunctions */
list_for_each_entry(sec, >sections, list) {
list_for_each_entry(sym, >symbol_list, list) {
+   char pname[MAX_NAME_LEN + 1];
+   size_t pnamelen;
if (sym->type != STT_FUNC)
continue;
sym->pfunc = sym->cfunc = sym;
@@ -305,9 +309,16 @@ static int read_symbols(struct elf *elf)
if (!coldstr)
continue;
 
-   coldstr[0] = '\0';
-   pfunc = find_symbol_by_name(elf, sym->name);
-   coldstr[0] = '.';
+   pnamelen = coldstr - sym->name;
+   if (pnamelen > MAX_NAME_LEN) {
+   WARN("%s(): parent function name exceeds 
maximum length of %d characters",
+sym->name, MAX_NAME_LEN);
+   return -1;
+   }
+
+   strncpy(pname, sym->name, pnamelen);
+   pname[pnamelen] = '\0';
+   pfunc = find_symbol_by_name(elf, pname);
 
if (!pfunc) {
WARN("%s(): can't find parent function",
-- 
2.19.1



[PATCH v4 0/2] objtool: read_symbols() fixes

2018-11-20 Thread Artem Savkov
The series started with 'parent symbol search' patch, but I found another issue
in read_symbols() while testing the failure-path.

Artem Savkov (2):
  objtool: fix failed cold symbol doublefree
  objtool: fix .cold functions parent symbols search

 tools/objtool/elf.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

-- 
2.19.1



[PATCH v3 2/2] objtool: fix .cold functions parent symbols search

2018-11-12 Thread Artem Savkov
Because find_symbol_by_name() traverses the same lists as read_symbols()
changing sym->name in place without copying it affects the result of
find_symbol_by_name() and, in case when ".cold" function precedes it's
parent in sec->symbol_list, can result in function being considered a
parent of itself. This leads to function length being set to 0 and other
consequent side-effects including a segfault in add_switch_table().
The effects of this bug are only visible when building with
-ffunction-sections in KCFLAGS.

Fix by copying the search string instead of modifying it in place.

Fixes: 13810435b9a7 "objtool: Support GCC 8's cold subfunctions"
Signed-off-by: Artem Savkov 
---
 tools/objtool/elf.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
index 3decd43477df..15d9acfb2c97 100644
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -31,6 +31,8 @@
 #include "elf.h"
 #include "warn.h"
 
+#define MAX_NAME_LEN 128
+
 struct section *find_section_by_name(struct elf *elf, const char *name)
 {
struct section *sec;
@@ -298,6 +300,8 @@ static int read_symbols(struct elf *elf)
/* Create parent/child links for any cold subfunctions */
list_for_each_entry(sec, >sections, list) {
list_for_each_entry(sym, >symbol_list, list) {
+   char pname[MAX_NAME_LEN + 1];
+   size_t pnamelen;
if (sym->type != STT_FUNC)
continue;
sym->pfunc = sym->cfunc = sym;
@@ -305,9 +309,16 @@ static int read_symbols(struct elf *elf)
if (!coldstr)
continue;
 
-   coldstr[0] = '\0';
-   pfunc = find_symbol_by_name(elf, sym->name);
-   coldstr[0] = '.';
+   pnamelen = coldstr - sym->name;
+   if (pnamelen > MAX_NAME_LEN) {
+   WARN("%s(): parent function name exceeds 
maximum length of %d characters",
+sym->name, MAX_NAME_LEN);
+   goto cold_err;
+   }
+
+   strncpy(pname, sym->name, pnamelen);
+   pname[pnamelen] = '\0';
+   pfunc = find_symbol_by_name(elf, pname);
 
if (!pfunc) {
WARN("%s(): can't find parent function",
-- 
2.17.2



[PATCH v3 1/2] objtool: fix failed cold symbol doublefree

2018-11-12 Thread Artem Savkov
If read_symbols() fails during second list traversal (the one dealing
with ".cold" subfunctions) it frees the symbol, but never deletes it
from the list/hash_table resulting in symbol being freed again in
elf_close().

Fixes: 13810435b9a7 "objtool: Support GCC 8's cold subfunctions"
Signed-off-by: Artem Savkov 
---
 tools/objtool/elf.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
index 6dbb9fae0f9d..3decd43477df 100644
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -312,7 +312,7 @@ static int read_symbols(struct elf *elf)
if (!pfunc) {
WARN("%s(): can't find parent function",
 sym->name);
-   goto err;
+   goto cold_err;
}
 
sym->pfunc = pfunc;
@@ -336,6 +336,9 @@ static int read_symbols(struct elf *elf)
 
return 0;
 
+cold_err:
+   list_del(>list);
+   hash_del(>hash);
 err:
free(sym);
return -1;
-- 
2.17.2



[PATCH v3 1/2] objtool: fix failed cold symbol doublefree

2018-11-12 Thread Artem Savkov
If read_symbols() fails during second list traversal (the one dealing
with ".cold" subfunctions) it frees the symbol, but never deletes it
from the list/hash_table resulting in symbol being freed again in
elf_close().

Fixes: 13810435b9a7 "objtool: Support GCC 8's cold subfunctions"
Signed-off-by: Artem Savkov 
---
 tools/objtool/elf.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
index 6dbb9fae0f9d..3decd43477df 100644
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -312,7 +312,7 @@ static int read_symbols(struct elf *elf)
if (!pfunc) {
WARN("%s(): can't find parent function",
 sym->name);
-   goto err;
+   goto cold_err;
}
 
sym->pfunc = pfunc;
@@ -336,6 +336,9 @@ static int read_symbols(struct elf *elf)
 
return 0;
 
+cold_err:
+   list_del(>list);
+   hash_del(>hash);
 err:
free(sym);
return -1;
-- 
2.17.2



[PATCH v3 2/2] objtool: fix .cold functions parent symbols search

2018-11-12 Thread Artem Savkov
Because find_symbol_by_name() traverses the same lists as read_symbols()
changing sym->name in place without copying it affects the result of
find_symbol_by_name() and, in case when ".cold" function precedes it's
parent in sec->symbol_list, can result in function being considered a
parent of itself. This leads to function length being set to 0 and other
consequent side-effects including a segfault in add_switch_table().
The effects of this bug are only visible when building with
-ffunction-sections in KCFLAGS.

Fix by copying the search string instead of modifying it in place.

Fixes: 13810435b9a7 "objtool: Support GCC 8's cold subfunctions"
Signed-off-by: Artem Savkov 
---
 tools/objtool/elf.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
index 3decd43477df..15d9acfb2c97 100644
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -31,6 +31,8 @@
 #include "elf.h"
 #include "warn.h"
 
+#define MAX_NAME_LEN 128
+
 struct section *find_section_by_name(struct elf *elf, const char *name)
 {
struct section *sec;
@@ -298,6 +300,8 @@ static int read_symbols(struct elf *elf)
/* Create parent/child links for any cold subfunctions */
list_for_each_entry(sec, >sections, list) {
list_for_each_entry(sym, >symbol_list, list) {
+   char pname[MAX_NAME_LEN + 1];
+   size_t pnamelen;
if (sym->type != STT_FUNC)
continue;
sym->pfunc = sym->cfunc = sym;
@@ -305,9 +309,16 @@ static int read_symbols(struct elf *elf)
if (!coldstr)
continue;
 
-   coldstr[0] = '\0';
-   pfunc = find_symbol_by_name(elf, sym->name);
-   coldstr[0] = '.';
+   pnamelen = coldstr - sym->name;
+   if (pnamelen > MAX_NAME_LEN) {
+   WARN("%s(): parent function name exceeds 
maximum length of %d characters",
+sym->name, MAX_NAME_LEN);
+   goto cold_err;
+   }
+
+   strncpy(pname, sym->name, pnamelen);
+   pname[pnamelen] = '\0';
+   pfunc = find_symbol_by_name(elf, pname);
 
if (!pfunc) {
WARN("%s(): can't find parent function",
-- 
2.17.2



[PATCH v3 0/2] objtool: read_symbols() fixes

2018-11-12 Thread Artem Savkov
The series started with 'parent symbol search' patch, but I found another issue
in read_symbols() while testing the failure-path.

Artem Savkov (2):
  objtool: fix failed cold symbol doublefree
  objtool: fix .cold functions parent symbols search

 tools/objtool/elf.c | 22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

-- 
2.17.2



[PATCH v3 0/2] objtool: read_symbols() fixes

2018-11-12 Thread Artem Savkov
The series started with 'parent symbol search' patch, but I found another issue
in read_symbols() while testing the failure-path.

Artem Savkov (2):
  objtool: fix failed cold symbol doublefree
  objtool: fix .cold functions parent symbols search

 tools/objtool/elf.c | 22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

-- 
2.17.2



Re: [PATCH v2] objtool: fix .cold. functions parent symbols search

2018-11-10 Thread Artem Savkov
On Fri, Nov 09, 2018 at 11:23:09AM -0600, Josh Poimboeuf wrote:
> On Wed, Nov 07, 2018 at 10:45:15PM +0100, Artem Savkov wrote:
> > Because find_symbol_by_name() traverses the same lists as read_symbols()
> > changing sym->name in place without copying it affects the result of
> > find_symbol_by_name() and, in case when ".cold" function precedes it's
> > parent in sec->symbol_list, can result in function being considered a
> > parent of itself. This leads to function length being set to 0 and other
> > consequent side-effects including a segfault in add_switch_table().
> > The effects of this bug are only visible when building with
> > -ffunction-sections in KCFLAGS.
> > 
> > Fix by copying the search string instead of modifying it in place.
> > 
> > Signed-off-by: Artem Savkov 
> 
> This needs a "Fixes" tag to identify the patch which introduced the bug.

Ok, will do.

> > ---
> >  tools/objtool/elf.c | 7 ---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
> > index 6dbb9fae0f9d..781c8afb29b9 100644
> > --- a/tools/objtool/elf.c
> > +++ b/tools/objtool/elf.c
> > @@ -298,6 +298,7 @@ static int read_symbols(struct elf *elf)
> > /* Create parent/child links for any cold subfunctions */
> > list_for_each_entry(sec, >sections, list) {
> > list_for_each_entry(sym, >symbol_list, list) {
> > +   char *pname;
> > if (sym->type != STT_FUNC)
> > continue;
> > sym->pfunc = sym->cfunc = sym;
> > @@ -305,9 +306,9 @@ static int read_symbols(struct elf *elf)
> > if (!coldstr)
> > continue;
> >  
> > -   coldstr[0] = '\0';
> > -   pfunc = find_symbol_by_name(elf, sym->name);
> > -   coldstr[0] = '.';
> > +   pname = strndup(sym->name, coldstr - sym->name);
> > +   pfunc = find_symbol_by_name(elf, pname);
> > +   free(pname);
> >  
> > if (!pfunc) {
> > WARN("%s(): can't find parent function",
> 
> strndup()'s return code needs to be checked.
> 
> Also, for such a short-lived allocation, I think a stack-allocated
> string would be better.

Hm, there seems to be no limit on lengths of strings in string table.
What size would you consider reasonable for this stack-allocated string?

-- 
Regards,
  Artem


Re: [PATCH v2] objtool: fix .cold. functions parent symbols search

2018-11-10 Thread Artem Savkov
On Fri, Nov 09, 2018 at 11:23:09AM -0600, Josh Poimboeuf wrote:
> On Wed, Nov 07, 2018 at 10:45:15PM +0100, Artem Savkov wrote:
> > Because find_symbol_by_name() traverses the same lists as read_symbols()
> > changing sym->name in place without copying it affects the result of
> > find_symbol_by_name() and, in case when ".cold" function precedes it's
> > parent in sec->symbol_list, can result in function being considered a
> > parent of itself. This leads to function length being set to 0 and other
> > consequent side-effects including a segfault in add_switch_table().
> > The effects of this bug are only visible when building with
> > -ffunction-sections in KCFLAGS.
> > 
> > Fix by copying the search string instead of modifying it in place.
> > 
> > Signed-off-by: Artem Savkov 
> 
> This needs a "Fixes" tag to identify the patch which introduced the bug.

Ok, will do.

> > ---
> >  tools/objtool/elf.c | 7 ---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
> > index 6dbb9fae0f9d..781c8afb29b9 100644
> > --- a/tools/objtool/elf.c
> > +++ b/tools/objtool/elf.c
> > @@ -298,6 +298,7 @@ static int read_symbols(struct elf *elf)
> > /* Create parent/child links for any cold subfunctions */
> > list_for_each_entry(sec, >sections, list) {
> > list_for_each_entry(sym, >symbol_list, list) {
> > +   char *pname;
> > if (sym->type != STT_FUNC)
> > continue;
> > sym->pfunc = sym->cfunc = sym;
> > @@ -305,9 +306,9 @@ static int read_symbols(struct elf *elf)
> > if (!coldstr)
> > continue;
> >  
> > -   coldstr[0] = '\0';
> > -   pfunc = find_symbol_by_name(elf, sym->name);
> > -   coldstr[0] = '.';
> > +   pname = strndup(sym->name, coldstr - sym->name);
> > +   pfunc = find_symbol_by_name(elf, pname);
> > +   free(pname);
> >  
> > if (!pfunc) {
> > WARN("%s(): can't find parent function",
> 
> strndup()'s return code needs to be checked.
> 
> Also, for such a short-lived allocation, I think a stack-allocated
> string would be better.

Hm, there seems to be no limit on lengths of strings in string table.
What size would you consider reasonable for this stack-allocated string?

-- 
Regards,
  Artem


[PATCH v2] objtool: fix .cold. functions parent symbols search

2018-11-07 Thread Artem Savkov
Because find_symbol_by_name() traverses the same lists as read_symbols()
changing sym->name in place without copying it affects the result of
find_symbol_by_name() and, in case when ".cold" function precedes it's
parent in sec->symbol_list, can result in function being considered a
parent of itself. This leads to function length being set to 0 and other
consequent side-effects including a segfault in add_switch_table().
The effects of this bug are only visible when building with
-ffunction-sections in KCFLAGS.

Fix by copying the search string instead of modifying it in place.

Signed-off-by: Artem Savkov 
---
 tools/objtool/elf.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
index 6dbb9fae0f9d..781c8afb29b9 100644
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -298,6 +298,7 @@ static int read_symbols(struct elf *elf)
/* Create parent/child links for any cold subfunctions */
list_for_each_entry(sec, >sections, list) {
list_for_each_entry(sym, >symbol_list, list) {
+   char *pname;
if (sym->type != STT_FUNC)
continue;
sym->pfunc = sym->cfunc = sym;
@@ -305,9 +306,9 @@ static int read_symbols(struct elf *elf)
if (!coldstr)
continue;
 
-   coldstr[0] = '\0';
-   pfunc = find_symbol_by_name(elf, sym->name);
-   coldstr[0] = '.';
+   pname = strndup(sym->name, coldstr - sym->name);
+   pfunc = find_symbol_by_name(elf, pname);
+   free(pname);
 
if (!pfunc) {
WARN("%s(): can't find parent function",
-- 
2.17.2



[PATCH v2] objtool: fix .cold. functions parent symbols search

2018-11-07 Thread Artem Savkov
Because find_symbol_by_name() traverses the same lists as read_symbols()
changing sym->name in place without copying it affects the result of
find_symbol_by_name() and, in case when ".cold" function precedes it's
parent in sec->symbol_list, can result in function being considered a
parent of itself. This leads to function length being set to 0 and other
consequent side-effects including a segfault in add_switch_table().
The effects of this bug are only visible when building with
-ffunction-sections in KCFLAGS.

Fix by copying the search string instead of modifying it in place.

Signed-off-by: Artem Savkov 
---
 tools/objtool/elf.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
index 6dbb9fae0f9d..781c8afb29b9 100644
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -298,6 +298,7 @@ static int read_symbols(struct elf *elf)
/* Create parent/child links for any cold subfunctions */
list_for_each_entry(sec, >sections, list) {
list_for_each_entry(sym, >symbol_list, list) {
+   char *pname;
if (sym->type != STT_FUNC)
continue;
sym->pfunc = sym->cfunc = sym;
@@ -305,9 +306,9 @@ static int read_symbols(struct elf *elf)
if (!coldstr)
continue;
 
-   coldstr[0] = '\0';
-   pfunc = find_symbol_by_name(elf, sym->name);
-   coldstr[0] = '.';
+   pname = strndup(sym->name, coldstr - sym->name);
+   pfunc = find_symbol_by_name(elf, pname);
+   free(pname);
 
if (!pfunc) {
WARN("%s(): can't find parent function",
-- 
2.17.2



Re: [PATCH] objtool: fix .cold. functions parent symbols search

2018-11-07 Thread Artem Savkov
On Wed, Nov 07, 2018 at 11:08:56AM -0600, Josh Poimboeuf wrote:
> On Wed, Nov 07, 2018 at 03:05:59PM +0100, Artem Savkov wrote:
> > The way it is currently done it is possible for read_symbols() to find the
> > same symbol as parent for ".cold" functions.
> 
> I seem to remember having this discussion for kpatch-build, but I forget
> the details.  Can you explain how this can happen (and also add that
> detail to the commit message)?

find_symbol_by_name() traverses the same lists as read_symbols and when
we change sym->name in place without copying it it changes in the list
as well. So if child function is before parent in sec->symbol_list the
same function will be returned as "parent". It is hard for me to put it
into words worthy to be included into commit message.

> I haven't seen any bug reports, is it purely theoretical?

No, 4.20-rc1 (actually anything after 4a60aa05a063 "objtool: Support
per-function rodata sections", before that add_switch_table() doesn't
seem to be called for those .cold. funcs) fails to build for mewith
KCFLAGS="-ffunction-sections -fdata-sections" because objtool is
segfaulting.

> > This leads to a bunch of
> > complications such as func length being set to 0 and a segfault in
> > add_switch_table(). Fix by copying the search string instead of modifying
> > it in place.
> > 
> > Signed-off-by: Artem Savkov 
> > ---
> >  tools/objtool/elf.c | 7 ---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
> > index 6dbb9fae0f9d..781c8afb29b9 100644
> > --- a/tools/objtool/elf.c
> > +++ b/tools/objtool/elf.c
> > @@ -298,6 +298,7 @@ static int read_symbols(struct elf *elf)
> > /* Create parent/child links for any cold subfunctions */
> > list_for_each_entry(sec, >sections, list) {
> > list_for_each_entry(sym, >symbol_list, list) {
> > +   char *pname;
> > if (sym->type != STT_FUNC)
> > continue;
> > sym->pfunc = sym->cfunc = sym;
> > @@ -305,9 +306,9 @@ static int read_symbols(struct elf *elf)
> > if (!coldstr)
> > continue;
> >  
> > -   coldstr[0] = '\0';
> > -   pfunc = find_symbol_by_name(elf, sym->name);
> > -   coldstr[0] = '.';
> > +   pname = strndup(sym->name, coldstr - sym->name);
> > +   pfunc = find_symbol_by_name(elf, pname);
> > +   free(pname);
> >  
> > if (!pfunc) {
> > WARN("%s(): can't find parent function",
> > -- 
> > 2.17.2
> > 
> 
> -- 
> Josh

-- 
Regards,
  Artem


Re: [PATCH] objtool: fix .cold. functions parent symbols search

2018-11-07 Thread Artem Savkov
On Wed, Nov 07, 2018 at 11:08:56AM -0600, Josh Poimboeuf wrote:
> On Wed, Nov 07, 2018 at 03:05:59PM +0100, Artem Savkov wrote:
> > The way it is currently done it is possible for read_symbols() to find the
> > same symbol as parent for ".cold" functions.
> 
> I seem to remember having this discussion for kpatch-build, but I forget
> the details.  Can you explain how this can happen (and also add that
> detail to the commit message)?

find_symbol_by_name() traverses the same lists as read_symbols and when
we change sym->name in place without copying it it changes in the list
as well. So if child function is before parent in sec->symbol_list the
same function will be returned as "parent". It is hard for me to put it
into words worthy to be included into commit message.

> I haven't seen any bug reports, is it purely theoretical?

No, 4.20-rc1 (actually anything after 4a60aa05a063 "objtool: Support
per-function rodata sections", before that add_switch_table() doesn't
seem to be called for those .cold. funcs) fails to build for mewith
KCFLAGS="-ffunction-sections -fdata-sections" because objtool is
segfaulting.

> > This leads to a bunch of
> > complications such as func length being set to 0 and a segfault in
> > add_switch_table(). Fix by copying the search string instead of modifying
> > it in place.
> > 
> > Signed-off-by: Artem Savkov 
> > ---
> >  tools/objtool/elf.c | 7 ---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
> > index 6dbb9fae0f9d..781c8afb29b9 100644
> > --- a/tools/objtool/elf.c
> > +++ b/tools/objtool/elf.c
> > @@ -298,6 +298,7 @@ static int read_symbols(struct elf *elf)
> > /* Create parent/child links for any cold subfunctions */
> > list_for_each_entry(sec, >sections, list) {
> > list_for_each_entry(sym, >symbol_list, list) {
> > +   char *pname;
> > if (sym->type != STT_FUNC)
> > continue;
> > sym->pfunc = sym->cfunc = sym;
> > @@ -305,9 +306,9 @@ static int read_symbols(struct elf *elf)
> > if (!coldstr)
> > continue;
> >  
> > -   coldstr[0] = '\0';
> > -   pfunc = find_symbol_by_name(elf, sym->name);
> > -   coldstr[0] = '.';
> > +   pname = strndup(sym->name, coldstr - sym->name);
> > +   pfunc = find_symbol_by_name(elf, pname);
> > +   free(pname);
> >  
> > if (!pfunc) {
> > WARN("%s(): can't find parent function",
> > -- 
> > 2.17.2
> > 
> 
> -- 
> Josh

-- 
Regards,
  Artem


[PATCH] objtool: fix .cold. functions parent symbols search

2018-11-07 Thread Artem Savkov
The way it is currently done it is possible for read_symbols() to find the
same symbol as parent for ".cold" functions. This leads to a bunch of
complications such as func length being set to 0 and a segfault in
add_switch_table(). Fix by copying the search string instead of modifying
it in place.

Signed-off-by: Artem Savkov 
---
 tools/objtool/elf.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
index 6dbb9fae0f9d..781c8afb29b9 100644
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -298,6 +298,7 @@ static int read_symbols(struct elf *elf)
/* Create parent/child links for any cold subfunctions */
list_for_each_entry(sec, >sections, list) {
list_for_each_entry(sym, >symbol_list, list) {
+   char *pname;
if (sym->type != STT_FUNC)
continue;
sym->pfunc = sym->cfunc = sym;
@@ -305,9 +306,9 @@ static int read_symbols(struct elf *elf)
if (!coldstr)
continue;
 
-   coldstr[0] = '\0';
-   pfunc = find_symbol_by_name(elf, sym->name);
-   coldstr[0] = '.';
+   pname = strndup(sym->name, coldstr - sym->name);
+   pfunc = find_symbol_by_name(elf, pname);
+   free(pname);
 
if (!pfunc) {
WARN("%s(): can't find parent function",
-- 
2.17.2



[PATCH] objtool: fix .cold. functions parent symbols search

2018-11-07 Thread Artem Savkov
The way it is currently done it is possible for read_symbols() to find the
same symbol as parent for ".cold" functions. This leads to a bunch of
complications such as func length being set to 0 and a segfault in
add_switch_table(). Fix by copying the search string instead of modifying
it in place.

Signed-off-by: Artem Savkov 
---
 tools/objtool/elf.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
index 6dbb9fae0f9d..781c8afb29b9 100644
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -298,6 +298,7 @@ static int read_symbols(struct elf *elf)
/* Create parent/child links for any cold subfunctions */
list_for_each_entry(sec, >sections, list) {
list_for_each_entry(sym, >symbol_list, list) {
+   char *pname;
if (sym->type != STT_FUNC)
continue;
sym->pfunc = sym->cfunc = sym;
@@ -305,9 +306,9 @@ static int read_symbols(struct elf *elf)
if (!coldstr)
continue;
 
-   coldstr[0] = '\0';
-   pfunc = find_symbol_by_name(elf, sym->name);
-   coldstr[0] = '.';
+   pname = strndup(sym->name, coldstr - sym->name);
+   pfunc = find_symbol_by_name(elf, pname);
+   free(pname);
 
if (!pfunc) {
WARN("%s(): can't find parent function",
-- 
2.17.2



[PATCH v2] tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure

2018-07-25 Thread Artem Savkov
If enable_trace_kprobe fails to enable the probe in enable_k(ret)probe
it returns an error, but does not unset the tp flags it set previously.
This results in a probe being considered enabled and failures like being
unable to remove the probe through kprobe_events file since probes_open()
expects every probe to be disabled.

Signed-off-by: Artem Savkov 
---
 kernel/trace/trace_kprobe.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 21f718472942..27ace4513c43 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -400,11 +400,10 @@ static struct trace_kprobe *find_trace_kprobe(const char 
*event,
 static int
 enable_trace_kprobe(struct trace_kprobe *tk, struct trace_event_file *file)
 {
+   struct event_file_link *link;
int ret = 0;
 
if (file) {
-   struct event_file_link *link;
-
link = kmalloc(sizeof(*link), GFP_KERNEL);
if (!link) {
ret = -ENOMEM;
@@ -424,6 +423,16 @@ enable_trace_kprobe(struct trace_kprobe *tk, struct 
trace_event_file *file)
else
ret = enable_kprobe(>rp.kp);
}
+
+   if (ret) {
+   if (file) {
+   list_del_rcu(>list);
+   kfree(link);
+   tk->tp.flags &= ~TP_FLAG_TRACE;
+   } else {
+   tk->tp.flags &= ~TP_FLAG_PROFILE;
+   }
+   }
  out:
return ret;
 }
-- 
2.13.6



[PATCH v2] tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure

2018-07-25 Thread Artem Savkov
If enable_trace_kprobe fails to enable the probe in enable_k(ret)probe
it returns an error, but does not unset the tp flags it set previously.
This results in a probe being considered enabled and failures like being
unable to remove the probe through kprobe_events file since probes_open()
expects every probe to be disabled.

Signed-off-by: Artem Savkov 
---
 kernel/trace/trace_kprobe.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 21f718472942..27ace4513c43 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -400,11 +400,10 @@ static struct trace_kprobe *find_trace_kprobe(const char 
*event,
 static int
 enable_trace_kprobe(struct trace_kprobe *tk, struct trace_event_file *file)
 {
+   struct event_file_link *link;
int ret = 0;
 
if (file) {
-   struct event_file_link *link;
-
link = kmalloc(sizeof(*link), GFP_KERNEL);
if (!link) {
ret = -ENOMEM;
@@ -424,6 +423,16 @@ enable_trace_kprobe(struct trace_kprobe *tk, struct 
trace_event_file *file)
else
ret = enable_kprobe(>rp.kp);
}
+
+   if (ret) {
+   if (file) {
+   list_del_rcu(>list);
+   kfree(link);
+   tk->tp.flags &= ~TP_FLAG_TRACE;
+   } else {
+   tk->tp.flags &= ~TP_FLAG_PROFILE;
+   }
+   }
  out:
return ret;
 }
-- 
2.13.6



[PATCH] kprobes: fix trace_probe flags in enable_trace_kprobe

2018-07-25 Thread Artem Savkov
If enable_trace_kprobe fails to enable the probe in enable_k(ret)probe
it returns an error, but does not unset the tp flags it set previosly.
This results in a probe being considered enabled and failures like being
unable to remove the probe through kprobe_events file since probes_open()
expects every probe to be disabled.

Signed-off-by: Artem Savkov 
---
 kernel/trace/trace_kprobe.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 21f718472942..fb887ced5056 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -400,11 +400,10 @@ static struct trace_kprobe *find_trace_kprobe(const char 
*event,
 static int
 enable_trace_kprobe(struct trace_kprobe *tk, struct trace_event_file *file)
 {
+   struct event_file_link *link;
int ret = 0;
 
if (file) {
-   struct event_file_link *link;
-
link = kmalloc(sizeof(*link), GFP_KERNEL);
if (!link) {
ret = -ENOMEM;
@@ -424,6 +423,16 @@ enable_trace_kprobe(struct trace_kprobe *tk, struct 
trace_event_file *file)
else
ret = enable_kprobe(>rp.kp);
}
+
+   if (ret) {
+   if (file) {
+   list_del(>list);
+   kfree(link);
+   tk->tp.flags &= ~TP_FLAG_TRACE;
+   } else {
+   tk->tp.flags &= ~TP_FLAG_PROFILE;
+   }
+   }
  out:
return ret;
 }
-- 
2.13.6



[PATCH] kprobes: fix trace_probe flags in enable_trace_kprobe

2018-07-25 Thread Artem Savkov
If enable_trace_kprobe fails to enable the probe in enable_k(ret)probe
it returns an error, but does not unset the tp flags it set previosly.
This results in a probe being considered enabled and failures like being
unable to remove the probe through kprobe_events file since probes_open()
expects every probe to be disabled.

Signed-off-by: Artem Savkov 
---
 kernel/trace/trace_kprobe.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 21f718472942..fb887ced5056 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -400,11 +400,10 @@ static struct trace_kprobe *find_trace_kprobe(const char 
*event,
 static int
 enable_trace_kprobe(struct trace_kprobe *tk, struct trace_event_file *file)
 {
+   struct event_file_link *link;
int ret = 0;
 
if (file) {
-   struct event_file_link *link;
-
link = kmalloc(sizeof(*link), GFP_KERNEL);
if (!link) {
ret = -ENOMEM;
@@ -424,6 +423,16 @@ enable_trace_kprobe(struct trace_kprobe *tk, struct 
trace_event_file *file)
else
ret = enable_kprobe(>rp.kp);
}
+
+   if (ret) {
+   if (file) {
+   list_del(>list);
+   kfree(link);
+   tk->tp.flags &= ~TP_FLAG_TRACE;
+   } else {
+   tk->tp.flags &= ~TP_FLAG_PROFILE;
+   }
+   }
  out:
return ret;
 }
-- 
2.13.6



Re: [PATCH 1/2] sun4i_ss_prng: fix return value of sun4i_ss_prng_generate

2018-02-07 Thread Artem Savkov
On Wed, Feb 07, 2018 at 10:56:59AM +0100, Corentin Labbe wrote:
> On Tue, Feb 06, 2018 at 10:20:21PM +0100, Artem Savkov wrote:
> > According to crypto/rng.h generate function should return 0 on success
> > and < 0 on error.
> > 
> > Fixes: b8ae5c7387ad ("crypto: sun4i-ss - support the Security System PRNG")
> > Signed-off-by: Artem Savkov <artem.sav...@gmail.com>
> > ---
> >  drivers/crypto/sunxi-ss/sun4i-ss-prng.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/crypto/sunxi-ss/sun4i-ss-prng.c 
> > b/drivers/crypto/sunxi-ss/sun4i-ss-prng.c
> > index 0d01d1624252..5754e0b92fb0 100644
> > --- a/drivers/crypto/sunxi-ss/sun4i-ss-prng.c
> > +++ b/drivers/crypto/sunxi-ss/sun4i-ss-prng.c
> > @@ -52,5 +52,5 @@ int sun4i_ss_prng_generate(struct crypto_rng *tfm, const 
> > u8 *src,
> >  
> > writel(0, ss->base + SS_CTL);
> > spin_unlock(>slock);
> > -   return dlen;
> > +   return 0;
> >  }
> > -- 
> > 2.15.1
> > 
> 
> According to Documentation/crypto/api-samples.rst ("Code Example For Random 
> Number Generator Usage")
> you must return the length of data generated.

I don't think that example is the same as rng_alg.generate, it has a
different protottype.

> So crypto_rng_generate/crypto_rng_get_bytes documentation in crypto/rng.h 
> must be fixed.

It's not just documentation, every other rng driver returns it this way
and it gets aead_init_geniv() (and subsequently crypto_create_tfm())
really confused because they expect the return value to be 0 or < 0.

> Herbert could you confirm ?


-- 
Regards,
  Artem


Re: [PATCH 1/2] sun4i_ss_prng: fix return value of sun4i_ss_prng_generate

2018-02-07 Thread Artem Savkov
On Wed, Feb 07, 2018 at 10:56:59AM +0100, Corentin Labbe wrote:
> On Tue, Feb 06, 2018 at 10:20:21PM +0100, Artem Savkov wrote:
> > According to crypto/rng.h generate function should return 0 on success
> > and < 0 on error.
> > 
> > Fixes: b8ae5c7387ad ("crypto: sun4i-ss - support the Security System PRNG")
> > Signed-off-by: Artem Savkov 
> > ---
> >  drivers/crypto/sunxi-ss/sun4i-ss-prng.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/crypto/sunxi-ss/sun4i-ss-prng.c 
> > b/drivers/crypto/sunxi-ss/sun4i-ss-prng.c
> > index 0d01d1624252..5754e0b92fb0 100644
> > --- a/drivers/crypto/sunxi-ss/sun4i-ss-prng.c
> > +++ b/drivers/crypto/sunxi-ss/sun4i-ss-prng.c
> > @@ -52,5 +52,5 @@ int sun4i_ss_prng_generate(struct crypto_rng *tfm, const 
> > u8 *src,
> >  
> > writel(0, ss->base + SS_CTL);
> > spin_unlock(>slock);
> > -   return dlen;
> > +   return 0;
> >  }
> > -- 
> > 2.15.1
> > 
> 
> According to Documentation/crypto/api-samples.rst ("Code Example For Random 
> Number Generator Usage")
> you must return the length of data generated.

I don't think that example is the same as rng_alg.generate, it has a
different protottype.

> So crypto_rng_generate/crypto_rng_get_bytes documentation in crypto/rng.h 
> must be fixed.

It's not just documentation, every other rng driver returns it this way
and it gets aead_init_geniv() (and subsequently crypto_create_tfm())
really confused because they expect the return value to be 0 or < 0.

> Herbert could you confirm ?


-- 
Regards,
  Artem


[PATCH 1/2] sun4i_ss_prng: fix return value of sun4i_ss_prng_generate

2018-02-06 Thread Artem Savkov
According to crypto/rng.h generate function should return 0 on success
and < 0 on error.

Fixes: b8ae5c7387ad ("crypto: sun4i-ss - support the Security System PRNG")
Signed-off-by: Artem Savkov <artem.sav...@gmail.com>
---
 drivers/crypto/sunxi-ss/sun4i-ss-prng.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/sunxi-ss/sun4i-ss-prng.c 
b/drivers/crypto/sunxi-ss/sun4i-ss-prng.c
index 0d01d1624252..5754e0b92fb0 100644
--- a/drivers/crypto/sunxi-ss/sun4i-ss-prng.c
+++ b/drivers/crypto/sunxi-ss/sun4i-ss-prng.c
@@ -52,5 +52,5 @@ int sun4i_ss_prng_generate(struct crypto_rng *tfm, const u8 
*src,
 
writel(0, ss->base + SS_CTL);
spin_unlock(>slock);
-   return dlen;
+   return 0;
 }
-- 
2.15.1



[PATCH 1/2] sun4i_ss_prng: fix return value of sun4i_ss_prng_generate

2018-02-06 Thread Artem Savkov
According to crypto/rng.h generate function should return 0 on success
and < 0 on error.

Fixes: b8ae5c7387ad ("crypto: sun4i-ss - support the Security System PRNG")
Signed-off-by: Artem Savkov 
---
 drivers/crypto/sunxi-ss/sun4i-ss-prng.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/sunxi-ss/sun4i-ss-prng.c 
b/drivers/crypto/sunxi-ss/sun4i-ss-prng.c
index 0d01d1624252..5754e0b92fb0 100644
--- a/drivers/crypto/sunxi-ss/sun4i-ss-prng.c
+++ b/drivers/crypto/sunxi-ss/sun4i-ss-prng.c
@@ -52,5 +52,5 @@ int sun4i_ss_prng_generate(struct crypto_rng *tfm, const u8 
*src,
 
writel(0, ss->base + SS_CTL);
spin_unlock(>slock);
-   return dlen;
+   return 0;
 }
-- 
2.15.1



[PATCH 2/2] sun4i_ss_prng: convert lock to _bh in sun4i_ss_prng_generate

2018-02-06 Thread Artem Savkov
Lockdep detects a possible deadlock in sun4i_ss_prng_generate() and
throws an "inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage" warning.
Disabling softirqs to fix this.

Fixes: b8ae5c7387ad ("crypto: sun4i-ss - support the Security System PRNG")
Signed-off-by: Artem Savkov <artem.sav...@gmail.com>
---
 drivers/crypto/sunxi-ss/sun4i-ss-prng.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/sunxi-ss/sun4i-ss-prng.c 
b/drivers/crypto/sunxi-ss/sun4i-ss-prng.c
index 5754e0b92fb0..63d636424161 100644
--- a/drivers/crypto/sunxi-ss/sun4i-ss-prng.c
+++ b/drivers/crypto/sunxi-ss/sun4i-ss-prng.c
@@ -28,7 +28,7 @@ int sun4i_ss_prng_generate(struct crypto_rng *tfm, const u8 
*src,
algt = container_of(alg, struct sun4i_ss_alg_template, alg.rng);
ss = algt->ss;
 
-   spin_lock(>slock);
+   spin_lock_bh(>slock);
 
writel(mode, ss->base + SS_CTL);
 
@@ -51,6 +51,6 @@ int sun4i_ss_prng_generate(struct crypto_rng *tfm, const u8 
*src,
}
 
writel(0, ss->base + SS_CTL);
-   spin_unlock(>slock);
+   spin_unlock_bh(>slock);
return 0;
 }
-- 
2.15.1



[PATCH 2/2] sun4i_ss_prng: convert lock to _bh in sun4i_ss_prng_generate

2018-02-06 Thread Artem Savkov
Lockdep detects a possible deadlock in sun4i_ss_prng_generate() and
throws an "inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage" warning.
Disabling softirqs to fix this.

Fixes: b8ae5c7387ad ("crypto: sun4i-ss - support the Security System PRNG")
Signed-off-by: Artem Savkov 
---
 drivers/crypto/sunxi-ss/sun4i-ss-prng.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/sunxi-ss/sun4i-ss-prng.c 
b/drivers/crypto/sunxi-ss/sun4i-ss-prng.c
index 5754e0b92fb0..63d636424161 100644
--- a/drivers/crypto/sunxi-ss/sun4i-ss-prng.c
+++ b/drivers/crypto/sunxi-ss/sun4i-ss-prng.c
@@ -28,7 +28,7 @@ int sun4i_ss_prng_generate(struct crypto_rng *tfm, const u8 
*src,
algt = container_of(alg, struct sun4i_ss_alg_template, alg.rng);
ss = algt->ss;
 
-   spin_lock(>slock);
+   spin_lock_bh(>slock);
 
writel(mode, ss->base + SS_CTL);
 
@@ -51,6 +51,6 @@ int sun4i_ss_prng_generate(struct crypto_rng *tfm, const u8 
*src,
}
 
writel(0, ss->base + SS_CTL);
-   spin_unlock(>slock);
+   spin_unlock_bh(>slock);
return 0;
 }
-- 
2.15.1



[PATCH 0/2] sun4i_ss_prng fixes

2018-02-06 Thread Artem Savkov
IPSec hasn't been working on my a10 board since 4.14 and it turned out to be
caused by sun4i_ss_rng driver.

Artem Savkov (2):
  sun4i_ss_prng: fix return value of sun4i_ss_prng_generate
  sun4i_ss_prng: convert lock to _bh in sun4i_ss_prng_generate

 drivers/crypto/sunxi-ss/sun4i-ss-prng.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-- 
2.15.1



[PATCH 0/2] sun4i_ss_prng fixes

2018-02-06 Thread Artem Savkov
IPSec hasn't been working on my a10 board since 4.14 and it turned out to be
caused by sun4i_ss_rng driver.

Artem Savkov (2):
  sun4i_ss_prng: fix return value of sun4i_ss_prng_generate
  sun4i_ss_prng: convert lock to _bh in sun4i_ss_prng_generate

 drivers/crypto/sunxi-ss/sun4i-ss-prng.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-- 
2.15.1



Re: [PATCH] xfrm: init skb_head lock for transport-mode packets

2018-01-04 Thread Artem Savkov
On Thu, Jan 04, 2018 at 10:01:32PM +1100, Herbert Xu wrote:
> On Thu, Jan 04, 2018 at 11:36:28AM +0100, Artem Savkov wrote:
> > Commit acf568ee859f "xfrm: Reinject transport-mode packets through tasklet"
> > adds an sk_buff_head queue, but never initializes trans->queue.lock, which
> > results in a "spinlock bad magic" BUG on skb_queue_tail() call in
> > xfrm_trans_queue.
> > Use skb_queue_head_init() instead of __skb_queue_head_init() to properly
> > initialize said lock.
> > 
> > Signed-off-by: Artem Savkov <asav...@redhat.com>
> 
> Thanks for catching this.  But we don't need the lock as this
> is meant to be per-CPU only.  So we should remove the locking
> instead:

Right, thats a better solution.

Reported-and-tested-by: Artem Savkov <asav...@redhat.com>

Thank you.

> ---8<---
> xfrm: Use __skb_queue_tail in xfrm_trans_queue
> 
> We do not need locking in xfrm_trans_queue because it is designed
> to use per-CPU buffers.  However, the original code incorrectly
> used skb_queue_tail which takes the lock.  This patch switches
> it to __skb_queue_tail instead.
> 
> Reported-by: Artem Savkov <asav...@redhat.com>
> Fixes: acf568ee859f ("xfrm: Reinject transport-mode packets...")
> Signed-off-by: Herbert Xu <herb...@gondor.apana.org.au>
> 
> diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
> index 098f47a..1eb0bba 100644
> --- a/net/xfrm/xfrm_input.c
> +++ b/net/xfrm/xfrm_input.c
> @@ -511,7 +511,7 @@ int xfrm_trans_queue_net(struct net *net, struct sk_buff 
> *skb,
>  
>   XFRM_TRANS_SKB_CB(skb)->finish = finish;
>   XFRM_TRANS_SKB_CB(skb)->net = net;
> - skb_queue_tail(>queue, skb);
> + __skb_queue_tail(>queue, skb);
>   tasklet_schedule(>tasklet);
>   return 0;
>  }

-- 
Regards,
  Artem


Re: [PATCH] xfrm: init skb_head lock for transport-mode packets

2018-01-04 Thread Artem Savkov
On Thu, Jan 04, 2018 at 10:01:32PM +1100, Herbert Xu wrote:
> On Thu, Jan 04, 2018 at 11:36:28AM +0100, Artem Savkov wrote:
> > Commit acf568ee859f "xfrm: Reinject transport-mode packets through tasklet"
> > adds an sk_buff_head queue, but never initializes trans->queue.lock, which
> > results in a "spinlock bad magic" BUG on skb_queue_tail() call in
> > xfrm_trans_queue.
> > Use skb_queue_head_init() instead of __skb_queue_head_init() to properly
> > initialize said lock.
> > 
> > Signed-off-by: Artem Savkov 
> 
> Thanks for catching this.  But we don't need the lock as this
> is meant to be per-CPU only.  So we should remove the locking
> instead:

Right, thats a better solution.

Reported-and-tested-by: Artem Savkov 

Thank you.

> ---8<---
> xfrm: Use __skb_queue_tail in xfrm_trans_queue
> 
> We do not need locking in xfrm_trans_queue because it is designed
> to use per-CPU buffers.  However, the original code incorrectly
> used skb_queue_tail which takes the lock.  This patch switches
> it to __skb_queue_tail instead.
> 
> Reported-by: Artem Savkov 
> Fixes: acf568ee859f ("xfrm: Reinject transport-mode packets...")
> Signed-off-by: Herbert Xu 
> 
> diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
> index 098f47a..1eb0bba 100644
> --- a/net/xfrm/xfrm_input.c
> +++ b/net/xfrm/xfrm_input.c
> @@ -511,7 +511,7 @@ int xfrm_trans_queue_net(struct net *net, struct sk_buff 
> *skb,
>  
>   XFRM_TRANS_SKB_CB(skb)->finish = finish;
>   XFRM_TRANS_SKB_CB(skb)->net = net;
> - skb_queue_tail(>queue, skb);
> + __skb_queue_tail(>queue, skb);
>   tasklet_schedule(>tasklet);
>   return 0;
>  }

-- 
Regards,
  Artem


[PATCH] xfrm: init skb_head lock for transport-mode packets

2018-01-04 Thread Artem Savkov
Commit acf568ee859f "xfrm: Reinject transport-mode packets through tasklet"
adds an sk_buff_head queue, but never initializes trans->queue.lock, which
results in a "spinlock bad magic" BUG on skb_queue_tail() call in
xfrm_trans_queue.
Use skb_queue_head_init() instead of __skb_queue_head_init() to properly
initialize said lock.

Signed-off-by: Artem Savkov <asav...@redhat.com>
---
 net/xfrm/xfrm_input.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 26b10eb7a206..d5389b9dbbb9 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -542,7 +542,7 @@ void __init xfrm_input_init(void)
struct xfrm_trans_tasklet *trans;
 
trans = _cpu(xfrm_trans_tasklet, i);
-   __skb_queue_head_init(>queue);
+   skb_queue_head_init(>queue);
tasklet_init(>tasklet, xfrm_trans_reinject,
 (unsigned long)trans);
}
-- 
2.13.6



[PATCH] xfrm: init skb_head lock for transport-mode packets

2018-01-04 Thread Artem Savkov
Commit acf568ee859f "xfrm: Reinject transport-mode packets through tasklet"
adds an sk_buff_head queue, but never initializes trans->queue.lock, which
results in a "spinlock bad magic" BUG on skb_queue_tail() call in
xfrm_trans_queue.
Use skb_queue_head_init() instead of __skb_queue_head_init() to properly
initialize said lock.

Signed-off-by: Artem Savkov 
---
 net/xfrm/xfrm_input.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 26b10eb7a206..d5389b9dbbb9 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -542,7 +542,7 @@ void __init xfrm_input_init(void)
struct xfrm_trans_tasklet *trans;
 
trans = _cpu(xfrm_trans_tasklet, i);
-   __skb_queue_head_init(>queue);
+   skb_queue_head_init(>queue);
tasklet_init(>tasklet, xfrm_trans_reinject,
 (unsigned long)trans);
}
-- 
2.13.6



[PATCH] xfrm: don't call xfrm_policy_cache_flush under xfrm_state_lock

2017-09-27 Thread Artem Savkov
I might be wrong but it doesn't look like xfrm_state_lock is required
for xfrm_policy_cache_flush and calling it under this lock triggers both
"sleeping function called from invalid context" and "possible circular
locking dependency detected" warnings on flush.

Fixes: ec30d78c14a8 xfrm: add xdst pcpu cache
Signed-off-by: Artem Savkov <asav...@redhat.com>
---
 net/xfrm/xfrm_state.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 0dab1cd79ce4..12213477cd3a 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -732,12 +732,12 @@ int xfrm_state_flush(struct net *net, u8 proto, bool 
task_valid)
}
}
}
+out:
+   spin_unlock_bh(>xfrm.xfrm_state_lock);
if (cnt) {
err = 0;
xfrm_policy_cache_flush();
}
-out:
-   spin_unlock_bh(>xfrm.xfrm_state_lock);
return err;
 }
 EXPORT_SYMBOL(xfrm_state_flush);
-- 
2.13.5



[PATCH] xfrm: don't call xfrm_policy_cache_flush under xfrm_state_lock

2017-09-27 Thread Artem Savkov
I might be wrong but it doesn't look like xfrm_state_lock is required
for xfrm_policy_cache_flush and calling it under this lock triggers both
"sleeping function called from invalid context" and "possible circular
locking dependency detected" warnings on flush.

Fixes: ec30d78c14a8 xfrm: add xdst pcpu cache
Signed-off-by: Artem Savkov 
---
 net/xfrm/xfrm_state.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 0dab1cd79ce4..12213477cd3a 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -732,12 +732,12 @@ int xfrm_state_flush(struct net *net, u8 proto, bool 
task_valid)
}
}
}
+out:
+   spin_unlock_bh(>xfrm.xfrm_state_lock);
if (cnt) {
err = 0;
xfrm_policy_cache_flush();
}
-out:
-   spin_unlock_bh(>xfrm.xfrm_state_lock);
return err;
 }
 EXPORT_SYMBOL(xfrm_state_flush);
-- 
2.13.5



[PATCH v3] ebtables: fix race condition in frame_filter_net_init()

2017-09-26 Thread Artem Savkov
It is possible for ebt_in_hook to be triggered before ebt_table is assigned
resulting in a NULL-pointer dereference. Make sure hooks are
registered as the last step.

v3: restore errorneously removed ops == NULL case check

Fixes: aee12a0a3727 ebtables: remove nf_hook_register usage
Signed-off-by: Artem Savkov <asav...@redhat.com>
---
 include/linux/netfilter_bridge/ebtables.h |  7 ---
 net/bridge/netfilter/ebtable_broute.c |  4 ++--
 net/bridge/netfilter/ebtable_filter.c |  4 ++--
 net/bridge/netfilter/ebtable_nat.c|  4 ++--
 net/bridge/netfilter/ebtables.c   | 17 +
 5 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/include/linux/netfilter_bridge/ebtables.h 
b/include/linux/netfilter_bridge/ebtables.h
index 2c2a5514b0df..528b24c78308 100644
--- a/include/linux/netfilter_bridge/ebtables.h
+++ b/include/linux/netfilter_bridge/ebtables.h
@@ -108,9 +108,10 @@ struct ebt_table {
 
 #define EBT_ALIGN(s) (((s) + (__alignof__(struct _xt_align)-1)) & \
 ~(__alignof__(struct _xt_align)-1))
-extern struct ebt_table *ebt_register_table(struct net *net,
-   const struct ebt_table *table,
-   const struct nf_hook_ops *);
+extern int ebt_register_table(struct net *net,
+ const struct ebt_table *table,
+ const struct nf_hook_ops *ops,
+ struct ebt_table **res);
 extern void ebt_unregister_table(struct net *net, struct ebt_table *table,
 const struct nf_hook_ops *);
 extern unsigned int ebt_do_table(struct sk_buff *skb,
diff --git a/net/bridge/netfilter/ebtable_broute.c 
b/net/bridge/netfilter/ebtable_broute.c
index 2585b100ebbb..276b60262981 100644
--- a/net/bridge/netfilter/ebtable_broute.c
+++ b/net/bridge/netfilter/ebtable_broute.c
@@ -65,8 +65,8 @@ static int ebt_broute(struct sk_buff *skb)
 
 static int __net_init broute_net_init(struct net *net)
 {
-   net->xt.broute_table = ebt_register_table(net, _table, NULL);
-   return PTR_ERR_OR_ZERO(net->xt.broute_table);
+   return ebt_register_table(net, _table, NULL,
+ >xt.broute_table);
 }
 
 static void __net_exit broute_net_exit(struct net *net)
diff --git a/net/bridge/netfilter/ebtable_filter.c 
b/net/bridge/netfilter/ebtable_filter.c
index 45a00dbdbcad..c41da5fac84f 100644
--- a/net/bridge/netfilter/ebtable_filter.c
+++ b/net/bridge/netfilter/ebtable_filter.c
@@ -93,8 +93,8 @@ static const struct nf_hook_ops ebt_ops_filter[] = {
 
 static int __net_init frame_filter_net_init(struct net *net)
 {
-   net->xt.frame_filter = ebt_register_table(net, _filter, 
ebt_ops_filter);
-   return PTR_ERR_OR_ZERO(net->xt.frame_filter);
+   return ebt_register_table(net, _filter, ebt_ops_filter,
+ >xt.frame_filter);
 }
 
 static void __net_exit frame_filter_net_exit(struct net *net)
diff --git a/net/bridge/netfilter/ebtable_nat.c 
b/net/bridge/netfilter/ebtable_nat.c
index 57cd5bb154e7..08df7406ecb3 100644
--- a/net/bridge/netfilter/ebtable_nat.c
+++ b/net/bridge/netfilter/ebtable_nat.c
@@ -93,8 +93,8 @@ static const struct nf_hook_ops ebt_ops_nat[] = {
 
 static int __net_init frame_nat_net_init(struct net *net)
 {
-   net->xt.frame_nat = ebt_register_table(net, _nat, ebt_ops_nat);
-   return PTR_ERR_OR_ZERO(net->xt.frame_nat);
+   return ebt_register_table(net, _nat, ebt_ops_nat,
+ >xt.frame_nat);
 }
 
 static void __net_exit frame_nat_net_exit(struct net *net)
diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index 83951f978445..3b3dcf719e07 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -1169,9 +1169,8 @@ static void __ebt_unregister_table(struct net *net, 
struct ebt_table *table)
kfree(table);
 }
 
-struct ebt_table *
-ebt_register_table(struct net *net, const struct ebt_table *input_table,
-  const struct nf_hook_ops *ops)
+int ebt_register_table(struct net *net, const struct ebt_table *input_table,
+  const struct nf_hook_ops *ops, struct ebt_table **res)
 {
struct ebt_table_info *newinfo;
struct ebt_table *t, *table;
@@ -1183,7 +1182,7 @@ ebt_register_table(struct net *net, const struct 
ebt_table *input_table,
repl->entries == NULL || repl->entries_size == 0 ||
repl->counters != NULL || input_table->private != NULL) {
BUGPRINT("Bad table data for ebt_register_table!!!\n");
-   return ERR_PTR(-EINVAL);
+   return -EINVAL;
}
 
/* Don't add one table to multiple lists. */
@@ -1252,16 +1251,18 @@ ebt_register_table(struct net *net, const struct 
ebt_table *input_table,
list_add(>list, >xt.tables[NFP

[PATCH v3] ebtables: fix race condition in frame_filter_net_init()

2017-09-26 Thread Artem Savkov
It is possible for ebt_in_hook to be triggered before ebt_table is assigned
resulting in a NULL-pointer dereference. Make sure hooks are
registered as the last step.

v3: restore errorneously removed ops == NULL case check

Fixes: aee12a0a3727 ebtables: remove nf_hook_register usage
Signed-off-by: Artem Savkov 
---
 include/linux/netfilter_bridge/ebtables.h |  7 ---
 net/bridge/netfilter/ebtable_broute.c |  4 ++--
 net/bridge/netfilter/ebtable_filter.c |  4 ++--
 net/bridge/netfilter/ebtable_nat.c|  4 ++--
 net/bridge/netfilter/ebtables.c   | 17 +
 5 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/include/linux/netfilter_bridge/ebtables.h 
b/include/linux/netfilter_bridge/ebtables.h
index 2c2a5514b0df..528b24c78308 100644
--- a/include/linux/netfilter_bridge/ebtables.h
+++ b/include/linux/netfilter_bridge/ebtables.h
@@ -108,9 +108,10 @@ struct ebt_table {
 
 #define EBT_ALIGN(s) (((s) + (__alignof__(struct _xt_align)-1)) & \
 ~(__alignof__(struct _xt_align)-1))
-extern struct ebt_table *ebt_register_table(struct net *net,
-   const struct ebt_table *table,
-   const struct nf_hook_ops *);
+extern int ebt_register_table(struct net *net,
+ const struct ebt_table *table,
+ const struct nf_hook_ops *ops,
+ struct ebt_table **res);
 extern void ebt_unregister_table(struct net *net, struct ebt_table *table,
 const struct nf_hook_ops *);
 extern unsigned int ebt_do_table(struct sk_buff *skb,
diff --git a/net/bridge/netfilter/ebtable_broute.c 
b/net/bridge/netfilter/ebtable_broute.c
index 2585b100ebbb..276b60262981 100644
--- a/net/bridge/netfilter/ebtable_broute.c
+++ b/net/bridge/netfilter/ebtable_broute.c
@@ -65,8 +65,8 @@ static int ebt_broute(struct sk_buff *skb)
 
 static int __net_init broute_net_init(struct net *net)
 {
-   net->xt.broute_table = ebt_register_table(net, _table, NULL);
-   return PTR_ERR_OR_ZERO(net->xt.broute_table);
+   return ebt_register_table(net, _table, NULL,
+ >xt.broute_table);
 }
 
 static void __net_exit broute_net_exit(struct net *net)
diff --git a/net/bridge/netfilter/ebtable_filter.c 
b/net/bridge/netfilter/ebtable_filter.c
index 45a00dbdbcad..c41da5fac84f 100644
--- a/net/bridge/netfilter/ebtable_filter.c
+++ b/net/bridge/netfilter/ebtable_filter.c
@@ -93,8 +93,8 @@ static const struct nf_hook_ops ebt_ops_filter[] = {
 
 static int __net_init frame_filter_net_init(struct net *net)
 {
-   net->xt.frame_filter = ebt_register_table(net, _filter, 
ebt_ops_filter);
-   return PTR_ERR_OR_ZERO(net->xt.frame_filter);
+   return ebt_register_table(net, _filter, ebt_ops_filter,
+ >xt.frame_filter);
 }
 
 static void __net_exit frame_filter_net_exit(struct net *net)
diff --git a/net/bridge/netfilter/ebtable_nat.c 
b/net/bridge/netfilter/ebtable_nat.c
index 57cd5bb154e7..08df7406ecb3 100644
--- a/net/bridge/netfilter/ebtable_nat.c
+++ b/net/bridge/netfilter/ebtable_nat.c
@@ -93,8 +93,8 @@ static const struct nf_hook_ops ebt_ops_nat[] = {
 
 static int __net_init frame_nat_net_init(struct net *net)
 {
-   net->xt.frame_nat = ebt_register_table(net, _nat, ebt_ops_nat);
-   return PTR_ERR_OR_ZERO(net->xt.frame_nat);
+   return ebt_register_table(net, _nat, ebt_ops_nat,
+ >xt.frame_nat);
 }
 
 static void __net_exit frame_nat_net_exit(struct net *net)
diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index 83951f978445..3b3dcf719e07 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -1169,9 +1169,8 @@ static void __ebt_unregister_table(struct net *net, 
struct ebt_table *table)
kfree(table);
 }
 
-struct ebt_table *
-ebt_register_table(struct net *net, const struct ebt_table *input_table,
-  const struct nf_hook_ops *ops)
+int ebt_register_table(struct net *net, const struct ebt_table *input_table,
+  const struct nf_hook_ops *ops, struct ebt_table **res)
 {
struct ebt_table_info *newinfo;
struct ebt_table *t, *table;
@@ -1183,7 +1182,7 @@ ebt_register_table(struct net *net, const struct 
ebt_table *input_table,
repl->entries == NULL || repl->entries_size == 0 ||
repl->counters != NULL || input_table->private != NULL) {
BUGPRINT("Bad table data for ebt_register_table!!!\n");
-   return ERR_PTR(-EINVAL);
+   return -EINVAL;
}
 
/* Don't add one table to multiple lists. */
@@ -1252,16 +1251,18 @@ ebt_register_table(struct net *net, const struct 
ebt_table *input_table,
list_add(>list, >xt.tables[NFPROTO_BRIDGE]);
   

[PATCH v2] ebtables: fix race condition in frame_filter_net_init()

2017-09-26 Thread Artem Savkov
It is possible for ebt_in_hook to be triggered before ebt_table is assigned
resulting in a NULL-pointer dereference. Make sure hooks are
registered as the last step.

Fixes: aee12a0a3727 ebtables: remove nf_hook_register usage
Signed-off-by: Artem Savkov <asav...@redhat.com>
---
 include/linux/netfilter_bridge/ebtables.h |  7 ---
 net/bridge/netfilter/ebtable_broute.c |  4 ++--
 net/bridge/netfilter/ebtable_filter.c |  4 ++--
 net/bridge/netfilter/ebtable_nat.c|  4 ++--
 net/bridge/netfilter/ebtables.c   | 17 -
 5 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/include/linux/netfilter_bridge/ebtables.h 
b/include/linux/netfilter_bridge/ebtables.h
index 2c2a5514b0df..528b24c78308 100644
--- a/include/linux/netfilter_bridge/ebtables.h
+++ b/include/linux/netfilter_bridge/ebtables.h
@@ -108,9 +108,10 @@ struct ebt_table {
 
 #define EBT_ALIGN(s) (((s) + (__alignof__(struct _xt_align)-1)) & \
 ~(__alignof__(struct _xt_align)-1))
-extern struct ebt_table *ebt_register_table(struct net *net,
-   const struct ebt_table *table,
-   const struct nf_hook_ops *);
+extern int ebt_register_table(struct net *net,
+ const struct ebt_table *table,
+ const struct nf_hook_ops *ops,
+ struct ebt_table **res);
 extern void ebt_unregister_table(struct net *net, struct ebt_table *table,
 const struct nf_hook_ops *);
 extern unsigned int ebt_do_table(struct sk_buff *skb,
diff --git a/net/bridge/netfilter/ebtable_broute.c 
b/net/bridge/netfilter/ebtable_broute.c
index 2585b100ebbb..276b60262981 100644
--- a/net/bridge/netfilter/ebtable_broute.c
+++ b/net/bridge/netfilter/ebtable_broute.c
@@ -65,8 +65,8 @@ static int ebt_broute(struct sk_buff *skb)
 
 static int __net_init broute_net_init(struct net *net)
 {
-   net->xt.broute_table = ebt_register_table(net, _table, NULL);
-   return PTR_ERR_OR_ZERO(net->xt.broute_table);
+   return ebt_register_table(net, _table, NULL,
+ >xt.broute_table);
 }
 
 static void __net_exit broute_net_exit(struct net *net)
diff --git a/net/bridge/netfilter/ebtable_filter.c 
b/net/bridge/netfilter/ebtable_filter.c
index 45a00dbdbcad..c41da5fac84f 100644
--- a/net/bridge/netfilter/ebtable_filter.c
+++ b/net/bridge/netfilter/ebtable_filter.c
@@ -93,8 +93,8 @@ static const struct nf_hook_ops ebt_ops_filter[] = {
 
 static int __net_init frame_filter_net_init(struct net *net)
 {
-   net->xt.frame_filter = ebt_register_table(net, _filter, 
ebt_ops_filter);
-   return PTR_ERR_OR_ZERO(net->xt.frame_filter);
+   return ebt_register_table(net, _filter, ebt_ops_filter,
+ >xt.frame_filter);
 }
 
 static void __net_exit frame_filter_net_exit(struct net *net)
diff --git a/net/bridge/netfilter/ebtable_nat.c 
b/net/bridge/netfilter/ebtable_nat.c
index 57cd5bb154e7..08df7406ecb3 100644
--- a/net/bridge/netfilter/ebtable_nat.c
+++ b/net/bridge/netfilter/ebtable_nat.c
@@ -93,8 +93,8 @@ static const struct nf_hook_ops ebt_ops_nat[] = {
 
 static int __net_init frame_nat_net_init(struct net *net)
 {
-   net->xt.frame_nat = ebt_register_table(net, _nat, ebt_ops_nat);
-   return PTR_ERR_OR_ZERO(net->xt.frame_nat);
+   return ebt_register_table(net, _nat, ebt_ops_nat,
+ >xt.frame_nat);
 }
 
 static void __net_exit frame_nat_net_exit(struct net *net)
diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index 83951f978445..aa81afe81f23 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -1169,9 +1169,8 @@ static void __ebt_unregister_table(struct net *net, 
struct ebt_table *table)
kfree(table);
 }
 
-struct ebt_table *
-ebt_register_table(struct net *net, const struct ebt_table *input_table,
-  const struct nf_hook_ops *ops)
+int ebt_register_table(struct net *net, const struct ebt_table *input_table,
+  const struct nf_hook_ops *ops, struct ebt_table **res)
 {
struct ebt_table_info *newinfo;
struct ebt_table *t, *table;
@@ -1183,7 +1182,7 @@ ebt_register_table(struct net *net, const struct 
ebt_table *input_table,
repl->entries == NULL || repl->entries_size == 0 ||
repl->counters != NULL || input_table->private != NULL) {
BUGPRINT("Bad table data for ebt_register_table!!!\n");
-   return ERR_PTR(-EINVAL);
+   return -EINVAL;
}
 
/* Don't add one table to multiple lists. */
@@ -1252,16 +1251,16 @@ ebt_register_table(struct net *net, const struct 
ebt_table *input_table,
list_add(>list, >xt.tables[NFPROTO_BRIDGE]);
mutex_unlock(_mutex);
 
-   if (!

[PATCH v2] ebtables: fix race condition in frame_filter_net_init()

2017-09-26 Thread Artem Savkov
It is possible for ebt_in_hook to be triggered before ebt_table is assigned
resulting in a NULL-pointer dereference. Make sure hooks are
registered as the last step.

Fixes: aee12a0a3727 ebtables: remove nf_hook_register usage
Signed-off-by: Artem Savkov 
---
 include/linux/netfilter_bridge/ebtables.h |  7 ---
 net/bridge/netfilter/ebtable_broute.c |  4 ++--
 net/bridge/netfilter/ebtable_filter.c |  4 ++--
 net/bridge/netfilter/ebtable_nat.c|  4 ++--
 net/bridge/netfilter/ebtables.c   | 17 -
 5 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/include/linux/netfilter_bridge/ebtables.h 
b/include/linux/netfilter_bridge/ebtables.h
index 2c2a5514b0df..528b24c78308 100644
--- a/include/linux/netfilter_bridge/ebtables.h
+++ b/include/linux/netfilter_bridge/ebtables.h
@@ -108,9 +108,10 @@ struct ebt_table {
 
 #define EBT_ALIGN(s) (((s) + (__alignof__(struct _xt_align)-1)) & \
 ~(__alignof__(struct _xt_align)-1))
-extern struct ebt_table *ebt_register_table(struct net *net,
-   const struct ebt_table *table,
-   const struct nf_hook_ops *);
+extern int ebt_register_table(struct net *net,
+ const struct ebt_table *table,
+ const struct nf_hook_ops *ops,
+ struct ebt_table **res);
 extern void ebt_unregister_table(struct net *net, struct ebt_table *table,
 const struct nf_hook_ops *);
 extern unsigned int ebt_do_table(struct sk_buff *skb,
diff --git a/net/bridge/netfilter/ebtable_broute.c 
b/net/bridge/netfilter/ebtable_broute.c
index 2585b100ebbb..276b60262981 100644
--- a/net/bridge/netfilter/ebtable_broute.c
+++ b/net/bridge/netfilter/ebtable_broute.c
@@ -65,8 +65,8 @@ static int ebt_broute(struct sk_buff *skb)
 
 static int __net_init broute_net_init(struct net *net)
 {
-   net->xt.broute_table = ebt_register_table(net, _table, NULL);
-   return PTR_ERR_OR_ZERO(net->xt.broute_table);
+   return ebt_register_table(net, _table, NULL,
+ >xt.broute_table);
 }
 
 static void __net_exit broute_net_exit(struct net *net)
diff --git a/net/bridge/netfilter/ebtable_filter.c 
b/net/bridge/netfilter/ebtable_filter.c
index 45a00dbdbcad..c41da5fac84f 100644
--- a/net/bridge/netfilter/ebtable_filter.c
+++ b/net/bridge/netfilter/ebtable_filter.c
@@ -93,8 +93,8 @@ static const struct nf_hook_ops ebt_ops_filter[] = {
 
 static int __net_init frame_filter_net_init(struct net *net)
 {
-   net->xt.frame_filter = ebt_register_table(net, _filter, 
ebt_ops_filter);
-   return PTR_ERR_OR_ZERO(net->xt.frame_filter);
+   return ebt_register_table(net, _filter, ebt_ops_filter,
+ >xt.frame_filter);
 }
 
 static void __net_exit frame_filter_net_exit(struct net *net)
diff --git a/net/bridge/netfilter/ebtable_nat.c 
b/net/bridge/netfilter/ebtable_nat.c
index 57cd5bb154e7..08df7406ecb3 100644
--- a/net/bridge/netfilter/ebtable_nat.c
+++ b/net/bridge/netfilter/ebtable_nat.c
@@ -93,8 +93,8 @@ static const struct nf_hook_ops ebt_ops_nat[] = {
 
 static int __net_init frame_nat_net_init(struct net *net)
 {
-   net->xt.frame_nat = ebt_register_table(net, _nat, ebt_ops_nat);
-   return PTR_ERR_OR_ZERO(net->xt.frame_nat);
+   return ebt_register_table(net, _nat, ebt_ops_nat,
+ >xt.frame_nat);
 }
 
 static void __net_exit frame_nat_net_exit(struct net *net)
diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index 83951f978445..aa81afe81f23 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -1169,9 +1169,8 @@ static void __ebt_unregister_table(struct net *net, 
struct ebt_table *table)
kfree(table);
 }
 
-struct ebt_table *
-ebt_register_table(struct net *net, const struct ebt_table *input_table,
-  const struct nf_hook_ops *ops)
+int ebt_register_table(struct net *net, const struct ebt_table *input_table,
+  const struct nf_hook_ops *ops, struct ebt_table **res)
 {
struct ebt_table_info *newinfo;
struct ebt_table *t, *table;
@@ -1183,7 +1182,7 @@ ebt_register_table(struct net *net, const struct 
ebt_table *input_table,
repl->entries == NULL || repl->entries_size == 0 ||
repl->counters != NULL || input_table->private != NULL) {
BUGPRINT("Bad table data for ebt_register_table!!!\n");
-   return ERR_PTR(-EINVAL);
+   return -EINVAL;
}
 
/* Don't add one table to multiple lists. */
@@ -1252,16 +1251,16 @@ ebt_register_table(struct net *net, const struct 
ebt_table *input_table,
list_add(>list, >xt.tables[NFPROTO_BRIDGE]);
mutex_unlock(_mutex);
 
-   if (!ops)
-   r

Re: [PATCH] ebtables: fix race condition in frame_filter_net_init()

2017-09-26 Thread Artem Savkov
On Tue, Sep 26, 2017 at 02:42:11PM +0200, Florian Westphal wrote:
> Artem Savkov <asav...@redhat.com> wrote:
> > It is possible for ebt_in_hook to be triggered before ebt_table is assigned
> > resulting in a NULL-pointer dereference. Make sure hooks are
> > registered as the last step.
> 
> Right, thanks for the patch.
> 
> > --- a/net/bridge/netfilter/ebtable_broute.c
> > +++ b/net/bridge/netfilter/ebtable_broute.c
> > @@ -65,7 +65,7 @@ static int ebt_broute(struct sk_buff *skb)
> >  
> >  static int __net_init broute_net_init(struct net *net)
> >  {
> > -   net->xt.broute_table = ebt_register_table(net, _table, NULL);
> > +   net->xt.broute_table = ebt_register_table(net, _table);
> 
> I wonder if it makes more sense to model this like the iptables version,
> i.e. pass net->xt.table_name as last arg to ebt_register_table ...
> 
> > +int ebt_register_hooks(struct net *net, struct ebt_table *table,
> > + const struct nf_hook_ops *ops)
> > +{
> > +   int ret = nf_register_net_hooks(net, ops, 
> > hweight32(table->valid_hooks));
> > +
> > +   if (ret)
> > +   __ebt_unregister_table(net, table);
> > +
> > +   return ret;
> > +}
> 
> ... because this looks strange (unregister of table/not-so-obvious error
> unwinding ...)
> 
> > @@ -1252,15 +1262,6 @@ ebt_register_table(struct net *net, const struct 
> > ebt_table *input_table,
> > list_add(>list, >xt.tables[NFPROTO_BRIDGE]);
> > mutex_unlock(_mutex);
> 
> ... here one could then assign the net->xt.table_X pointer, and then do
> the hook registration right after.
> 
> However i have no strong opinion here.

Agreed, that does look better and requires less changes. I'll send a v2.

-- 
Regards,
  Artem


Re: [PATCH] ebtables: fix race condition in frame_filter_net_init()

2017-09-26 Thread Artem Savkov
On Tue, Sep 26, 2017 at 02:42:11PM +0200, Florian Westphal wrote:
> Artem Savkov  wrote:
> > It is possible for ebt_in_hook to be triggered before ebt_table is assigned
> > resulting in a NULL-pointer dereference. Make sure hooks are
> > registered as the last step.
> 
> Right, thanks for the patch.
> 
> > --- a/net/bridge/netfilter/ebtable_broute.c
> > +++ b/net/bridge/netfilter/ebtable_broute.c
> > @@ -65,7 +65,7 @@ static int ebt_broute(struct sk_buff *skb)
> >  
> >  static int __net_init broute_net_init(struct net *net)
> >  {
> > -   net->xt.broute_table = ebt_register_table(net, _table, NULL);
> > +   net->xt.broute_table = ebt_register_table(net, _table);
> 
> I wonder if it makes more sense to model this like the iptables version,
> i.e. pass net->xt.table_name as last arg to ebt_register_table ...
> 
> > +int ebt_register_hooks(struct net *net, struct ebt_table *table,
> > + const struct nf_hook_ops *ops)
> > +{
> > +   int ret = nf_register_net_hooks(net, ops, 
> > hweight32(table->valid_hooks));
> > +
> > +   if (ret)
> > +   __ebt_unregister_table(net, table);
> > +
> > +   return ret;
> > +}
> 
> ... because this looks strange (unregister of table/not-so-obvious error
> unwinding ...)
> 
> > @@ -1252,15 +1262,6 @@ ebt_register_table(struct net *net, const struct 
> > ebt_table *input_table,
> > list_add(>list, >xt.tables[NFPROTO_BRIDGE]);
> > mutex_unlock(_mutex);
> 
> ... here one could then assign the net->xt.table_X pointer, and then do
> the hook registration right after.
> 
> However i have no strong opinion here.

Agreed, that does look better and requires less changes. I'll send a v2.

-- 
Regards,
  Artem


[PATCH] ebtables: fix race condition in frame_filter_net_init()

2017-09-26 Thread Artem Savkov
It is possible for ebt_in_hook to be triggered before ebt_table is assigned
resulting in a NULL-pointer dereference. Make sure hooks are
registered as the last step.

Fixes: aee12a0a3727 ebtables: remove nf_hook_register usage
Signed-off-by: Artem Savkov <asav...@redhat.com>
---
 include/linux/netfilter_bridge/ebtables.h |  5 +++--
 net/bridge/netfilter/ebtable_broute.c |  2 +-
 net/bridge/netfilter/ebtable_filter.c |  8 ++--
 net/bridge/netfilter/ebtable_nat.c|  8 ++--
 net/bridge/netfilter/ebtables.c   | 24 +---
 5 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/include/linux/netfilter_bridge/ebtables.h 
b/include/linux/netfilter_bridge/ebtables.h
index 2c2a5514b0df..7d68f5ba6ded 100644
--- a/include/linux/netfilter_bridge/ebtables.h
+++ b/include/linux/netfilter_bridge/ebtables.h
@@ -109,8 +109,9 @@ struct ebt_table {
 #define EBT_ALIGN(s) (((s) + (__alignof__(struct _xt_align)-1)) & \
 ~(__alignof__(struct _xt_align)-1))
 extern struct ebt_table *ebt_register_table(struct net *net,
-   const struct ebt_table *table,
-   const struct nf_hook_ops *);
+   const struct ebt_table *table);
+extern int ebt_register_hooks(struct net *net, struct ebt_table *table,
+const struct nf_hook_ops *ops);
 extern void ebt_unregister_table(struct net *net, struct ebt_table *table,
 const struct nf_hook_ops *);
 extern unsigned int ebt_do_table(struct sk_buff *skb,
diff --git a/net/bridge/netfilter/ebtable_broute.c 
b/net/bridge/netfilter/ebtable_broute.c
index 2585b100ebbb..b41017409aa5 100644
--- a/net/bridge/netfilter/ebtable_broute.c
+++ b/net/bridge/netfilter/ebtable_broute.c
@@ -65,7 +65,7 @@ static int ebt_broute(struct sk_buff *skb)
 
 static int __net_init broute_net_init(struct net *net)
 {
-   net->xt.broute_table = ebt_register_table(net, _table, NULL);
+   net->xt.broute_table = ebt_register_table(net, _table);
return PTR_ERR_OR_ZERO(net->xt.broute_table);
 }
 
diff --git a/net/bridge/netfilter/ebtable_filter.c 
b/net/bridge/netfilter/ebtable_filter.c
index 45a00dbdbcad..ca04582b374e 100644
--- a/net/bridge/netfilter/ebtable_filter.c
+++ b/net/bridge/netfilter/ebtable_filter.c
@@ -93,8 +93,12 @@ static const struct nf_hook_ops ebt_ops_filter[] = {
 
 static int __net_init frame_filter_net_init(struct net *net)
 {
-   net->xt.frame_filter = ebt_register_table(net, _filter, 
ebt_ops_filter);
-   return PTR_ERR_OR_ZERO(net->xt.frame_filter);
+   net->xt.frame_filter = ebt_register_table(net, _filter);
+
+   if (IS_ERR(net->xt.frame_filter))
+   return PTR_ERR(net->xt.frame_filter);
+
+   return ebt_register_hooks(net, net->xt.frame_filter, ebt_ops_filter);
 }
 
 static void __net_exit frame_filter_net_exit(struct net *net)
diff --git a/net/bridge/netfilter/ebtable_nat.c 
b/net/bridge/netfilter/ebtable_nat.c
index 57cd5bb154e7..f4a2ff93be34 100644
--- a/net/bridge/netfilter/ebtable_nat.c
+++ b/net/bridge/netfilter/ebtable_nat.c
@@ -93,8 +93,12 @@ static const struct nf_hook_ops ebt_ops_nat[] = {
 
 static int __net_init frame_nat_net_init(struct net *net)
 {
-   net->xt.frame_nat = ebt_register_table(net, _nat, ebt_ops_nat);
-   return PTR_ERR_OR_ZERO(net->xt.frame_nat);
+   net->xt.frame_nat = ebt_register_table(net, _nat);
+
+   if (IS_ERR(net->xt.frame_nat))
+   return PTR_ERR(net->xt.frame_nat);
+
+   return ebt_register_hooks(net, net->xt.frame_nat, ebt_ops_nat);
 }
 
 static void __net_exit frame_nat_net_exit(struct net *net)
diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index 83951f978445..e72120ac426e 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -1169,9 +1169,19 @@ static void __ebt_unregister_table(struct net *net, 
struct ebt_table *table)
kfree(table);
 }
 
+int ebt_register_hooks(struct net *net, struct ebt_table *table,
+ const struct nf_hook_ops *ops)
+{
+   int ret = nf_register_net_hooks(net, ops, 
hweight32(table->valid_hooks));
+
+   if (ret)
+   __ebt_unregister_table(net, table);
+
+   return ret;
+}
+
 struct ebt_table *
-ebt_register_table(struct net *net, const struct ebt_table *input_table,
-  const struct nf_hook_ops *ops)
+ebt_register_table(struct net *net, const struct ebt_table *input_table)
 {
struct ebt_table_info *newinfo;
struct ebt_table *t, *table;
@@ -1252,15 +1262,6 @@ ebt_register_table(struct net *net, const struct 
ebt_table *input_table,
list_add(>list, >xt.tables[NFPROTO_BRIDGE]);
mutex_unlock(_mutex);
 
-   if (!ops)
-   return table;
-
-   ret = nf_register_net

[PATCH] ebtables: fix race condition in frame_filter_net_init()

2017-09-26 Thread Artem Savkov
It is possible for ebt_in_hook to be triggered before ebt_table is assigned
resulting in a NULL-pointer dereference. Make sure hooks are
registered as the last step.

Fixes: aee12a0a3727 ebtables: remove nf_hook_register usage
Signed-off-by: Artem Savkov 
---
 include/linux/netfilter_bridge/ebtables.h |  5 +++--
 net/bridge/netfilter/ebtable_broute.c |  2 +-
 net/bridge/netfilter/ebtable_filter.c |  8 ++--
 net/bridge/netfilter/ebtable_nat.c|  8 ++--
 net/bridge/netfilter/ebtables.c   | 24 +---
 5 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/include/linux/netfilter_bridge/ebtables.h 
b/include/linux/netfilter_bridge/ebtables.h
index 2c2a5514b0df..7d68f5ba6ded 100644
--- a/include/linux/netfilter_bridge/ebtables.h
+++ b/include/linux/netfilter_bridge/ebtables.h
@@ -109,8 +109,9 @@ struct ebt_table {
 #define EBT_ALIGN(s) (((s) + (__alignof__(struct _xt_align)-1)) & \
 ~(__alignof__(struct _xt_align)-1))
 extern struct ebt_table *ebt_register_table(struct net *net,
-   const struct ebt_table *table,
-   const struct nf_hook_ops *);
+   const struct ebt_table *table);
+extern int ebt_register_hooks(struct net *net, struct ebt_table *table,
+const struct nf_hook_ops *ops);
 extern void ebt_unregister_table(struct net *net, struct ebt_table *table,
 const struct nf_hook_ops *);
 extern unsigned int ebt_do_table(struct sk_buff *skb,
diff --git a/net/bridge/netfilter/ebtable_broute.c 
b/net/bridge/netfilter/ebtable_broute.c
index 2585b100ebbb..b41017409aa5 100644
--- a/net/bridge/netfilter/ebtable_broute.c
+++ b/net/bridge/netfilter/ebtable_broute.c
@@ -65,7 +65,7 @@ static int ebt_broute(struct sk_buff *skb)
 
 static int __net_init broute_net_init(struct net *net)
 {
-   net->xt.broute_table = ebt_register_table(net, _table, NULL);
+   net->xt.broute_table = ebt_register_table(net, _table);
return PTR_ERR_OR_ZERO(net->xt.broute_table);
 }
 
diff --git a/net/bridge/netfilter/ebtable_filter.c 
b/net/bridge/netfilter/ebtable_filter.c
index 45a00dbdbcad..ca04582b374e 100644
--- a/net/bridge/netfilter/ebtable_filter.c
+++ b/net/bridge/netfilter/ebtable_filter.c
@@ -93,8 +93,12 @@ static const struct nf_hook_ops ebt_ops_filter[] = {
 
 static int __net_init frame_filter_net_init(struct net *net)
 {
-   net->xt.frame_filter = ebt_register_table(net, _filter, 
ebt_ops_filter);
-   return PTR_ERR_OR_ZERO(net->xt.frame_filter);
+   net->xt.frame_filter = ebt_register_table(net, _filter);
+
+   if (IS_ERR(net->xt.frame_filter))
+   return PTR_ERR(net->xt.frame_filter);
+
+   return ebt_register_hooks(net, net->xt.frame_filter, ebt_ops_filter);
 }
 
 static void __net_exit frame_filter_net_exit(struct net *net)
diff --git a/net/bridge/netfilter/ebtable_nat.c 
b/net/bridge/netfilter/ebtable_nat.c
index 57cd5bb154e7..f4a2ff93be34 100644
--- a/net/bridge/netfilter/ebtable_nat.c
+++ b/net/bridge/netfilter/ebtable_nat.c
@@ -93,8 +93,12 @@ static const struct nf_hook_ops ebt_ops_nat[] = {
 
 static int __net_init frame_nat_net_init(struct net *net)
 {
-   net->xt.frame_nat = ebt_register_table(net, _nat, ebt_ops_nat);
-   return PTR_ERR_OR_ZERO(net->xt.frame_nat);
+   net->xt.frame_nat = ebt_register_table(net, _nat);
+
+   if (IS_ERR(net->xt.frame_nat))
+   return PTR_ERR(net->xt.frame_nat);
+
+   return ebt_register_hooks(net, net->xt.frame_nat, ebt_ops_nat);
 }
 
 static void __net_exit frame_nat_net_exit(struct net *net)
diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index 83951f978445..e72120ac426e 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -1169,9 +1169,19 @@ static void __ebt_unregister_table(struct net *net, 
struct ebt_table *table)
kfree(table);
 }
 
+int ebt_register_hooks(struct net *net, struct ebt_table *table,
+ const struct nf_hook_ops *ops)
+{
+   int ret = nf_register_net_hooks(net, ops, 
hweight32(table->valid_hooks));
+
+   if (ret)
+   __ebt_unregister_table(net, table);
+
+   return ret;
+}
+
 struct ebt_table *
-ebt_register_table(struct net *net, const struct ebt_table *input_table,
-  const struct nf_hook_ops *ops)
+ebt_register_table(struct net *net, const struct ebt_table *input_table)
 {
struct ebt_table_info *newinfo;
struct ebt_table *t, *table;
@@ -1252,15 +1262,6 @@ ebt_register_table(struct net *net, const struct 
ebt_table *input_table,
list_add(>list, >xt.tables[NFPROTO_BRIDGE]);
mutex_unlock(_mutex);
 
-   if (!ops)
-   return table;
-
-   ret = nf_register_net_hooks

Re: MADV_FREE is broken

2017-09-21 Thread Artem Savkov
On Wed, Sep 20, 2017 at 03:37:33PM -0700, Shaohua Li wrote:
> On Wed, Sep 20, 2017 at 11:01:47AM +0200, Artem Savkov wrote:
> > Hi All,
> > 
> > We recently started noticing madvise09[1] test from ltp failing strangely. 
> > The
> > test does the following: maps 32 pages, sets MADV_FREE for the range it got,
> > dirties 2 of the pages, creates memory pressure and check that nondirty 
> > pages
> > are free. The test hanged while accessing the last 4 pages(29-32) of 
> > madvised
> > range at line 121 [2]. Any other process (gdb/cat) accessing those pages
> > would also hang as would rebooting the machine. It doesn't trigger any debug
> > warnings or kasan.
> > 
> > The issue bisected to "802a3a92ad7a mm: reclaim MADV_FREE pages" (so 4.12 
> > and
> > up are affected).
> > 
> > I did some poking around and found out that the "bad" pages had SwapBacked 
> > flag
> > set in shrink_page_list() which confused it a lot. It looks like
> > mark_page_lazyfree() only calls lru_lazyfree_fn() when the pagevec is full
> > (that is in batches of 14) and never drains the rest (so last four in 
> > madvise09
> > case).
> > 
> > The patch below greatly reduces the failure rate, but doesn't fix it
> > completely, it still shows up with the same symptoms (hanging trying to 
> > access
> > last 4 pages) after a bunch of retries.
> > 
> > [1] 
> > https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise09.c
> > [2] 
> > https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise09.c#L121
> > 
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index 21261ff0466f..a0b868e8b7d2 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -453,6 +453,7 @@ static void madvise_free_page_range(struct mmu_gather 
> > *tlb,
> >  
> > tlb_start_vma(tlb, vma);
> > walk_page_range(addr, end, _walk);
> > +   lru_add_drain();
> > tlb_end_vma(tlb, vma);
> >  }
> 
> Looks there is a race between clear pte dirty bit and clear SwapBacked bit.
> draining the vect helps a little, but not sufficient. If SwapBacked is set, we
> could add the page to swapcache, but since we the page isn't dirty, we don't
> write the page out. This could cause data corruption. There is another place 
> we
> wrongly clear SwapBacked bit. Below is a test patch which seems to fix the
> issue, please give a try.

Ran it for quite some time and it does seem to fix the issue.

> diff --git a/mm/swap.c b/mm/swap.c
> index 62d96b8..5c58257 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -575,7 +575,7 @@ static void lru_lazyfree_fn(struct page *page, struct 
> lruvec *lruvec,
>   void *arg)
>  {
>   if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) &&
> - !PageUnevictable(page)) {
> + !PageSwapCache(page) && !PageUnevictable(page)) {
>   bool active = PageActive(page);
>  
>   del_page_from_lru_list(page, lruvec,

Shouldn't the same check be added to mark_page_lazyfree()?

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 13d711d..be1c98e 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -980,6 +980,7 @@ static unsigned long shrink_page_list(struct list_head 
> *page_list,
>   int may_enter_fs;
>   enum page_references references = PAGEREF_RECLAIM_CLEAN;
>   bool dirty, writeback;
> + bool new_added_swapcache = false;
>  
>   cond_resched();
>  
> @@ -1165,6 +1166,7 @@ static unsigned long shrink_page_list(struct list_head 
> *page_list,
>  
>   /* Adding to swap updated mapping */
>   mapping = page_mapping(page);
> + new_added_swapcache = true;
>   }
>   } else if (unlikely(PageTransHuge(page))) {
>   /* Split file THP */
> @@ -1185,6 +1187,10 @@ static unsigned long shrink_page_list(struct list_head 
> *page_list,
>   nr_unmap_fail++;
>   goto activate_locked;
>   }
> + /* race with MADV_FREE */
> + if (PageAnon(page) && !PageDirty(page) &&
> + PageSwapBacked(page) && new_added_swapcache)
> + set_page_dirty(page);
>   }
>  
>   if (PageDirty(page)) {

-- 
Regards,
  Artem


Re: MADV_FREE is broken

2017-09-21 Thread Artem Savkov
On Wed, Sep 20, 2017 at 03:37:33PM -0700, Shaohua Li wrote:
> On Wed, Sep 20, 2017 at 11:01:47AM +0200, Artem Savkov wrote:
> > Hi All,
> > 
> > We recently started noticing madvise09[1] test from ltp failing strangely. 
> > The
> > test does the following: maps 32 pages, sets MADV_FREE for the range it got,
> > dirties 2 of the pages, creates memory pressure and check that nondirty 
> > pages
> > are free. The test hanged while accessing the last 4 pages(29-32) of 
> > madvised
> > range at line 121 [2]. Any other process (gdb/cat) accessing those pages
> > would also hang as would rebooting the machine. It doesn't trigger any debug
> > warnings or kasan.
> > 
> > The issue bisected to "802a3a92ad7a mm: reclaim MADV_FREE pages" (so 4.12 
> > and
> > up are affected).
> > 
> > I did some poking around and found out that the "bad" pages had SwapBacked 
> > flag
> > set in shrink_page_list() which confused it a lot. It looks like
> > mark_page_lazyfree() only calls lru_lazyfree_fn() when the pagevec is full
> > (that is in batches of 14) and never drains the rest (so last four in 
> > madvise09
> > case).
> > 
> > The patch below greatly reduces the failure rate, but doesn't fix it
> > completely, it still shows up with the same symptoms (hanging trying to 
> > access
> > last 4 pages) after a bunch of retries.
> > 
> > [1] 
> > https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise09.c
> > [2] 
> > https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise09.c#L121
> > 
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index 21261ff0466f..a0b868e8b7d2 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -453,6 +453,7 @@ static void madvise_free_page_range(struct mmu_gather 
> > *tlb,
> >  
> > tlb_start_vma(tlb, vma);
> > walk_page_range(addr, end, _walk);
> > +   lru_add_drain();
> > tlb_end_vma(tlb, vma);
> >  }
> 
> Looks there is a race between clear pte dirty bit and clear SwapBacked bit.
> draining the vect helps a little, but not sufficient. If SwapBacked is set, we
> could add the page to swapcache, but since we the page isn't dirty, we don't
> write the page out. This could cause data corruption. There is another place 
> we
> wrongly clear SwapBacked bit. Below is a test patch which seems to fix the
> issue, please give a try.

Ran it for quite some time and it does seem to fix the issue.

> diff --git a/mm/swap.c b/mm/swap.c
> index 62d96b8..5c58257 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -575,7 +575,7 @@ static void lru_lazyfree_fn(struct page *page, struct 
> lruvec *lruvec,
>   void *arg)
>  {
>   if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) &&
> - !PageUnevictable(page)) {
> + !PageSwapCache(page) && !PageUnevictable(page)) {
>   bool active = PageActive(page);
>  
>   del_page_from_lru_list(page, lruvec,

Shouldn't the same check be added to mark_page_lazyfree()?

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 13d711d..be1c98e 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -980,6 +980,7 @@ static unsigned long shrink_page_list(struct list_head 
> *page_list,
>   int may_enter_fs;
>   enum page_references references = PAGEREF_RECLAIM_CLEAN;
>   bool dirty, writeback;
> + bool new_added_swapcache = false;
>  
>   cond_resched();
>  
> @@ -1165,6 +1166,7 @@ static unsigned long shrink_page_list(struct list_head 
> *page_list,
>  
>   /* Adding to swap updated mapping */
>   mapping = page_mapping(page);
> + new_added_swapcache = true;
>   }
>   } else if (unlikely(PageTransHuge(page))) {
>   /* Split file THP */
> @@ -1185,6 +1187,10 @@ static unsigned long shrink_page_list(struct list_head 
> *page_list,
>   nr_unmap_fail++;
>   goto activate_locked;
>   }
> + /* race with MADV_FREE */
> + if (PageAnon(page) && !PageDirty(page) &&
> + PageSwapBacked(page) && new_added_swapcache)
> + set_page_dirty(page);
>   }
>  
>   if (PageDirty(page)) {

-- 
Regards,
  Artem


MADV_FREE is broken

2017-09-20 Thread Artem Savkov
Hi All,

We recently started noticing madvise09[1] test from ltp failing strangely. The
test does the following: maps 32 pages, sets MADV_FREE for the range it got,
dirties 2 of the pages, creates memory pressure and check that nondirty pages
are free. The test hanged while accessing the last 4 pages(29-32) of madvised
range at line 121 [2]. Any other process (gdb/cat) accessing those pages
would also hang as would rebooting the machine. It doesn't trigger any debug
warnings or kasan.

The issue bisected to "802a3a92ad7a mm: reclaim MADV_FREE pages" (so 4.12 and
up are affected).

I did some poking around and found out that the "bad" pages had SwapBacked flag
set in shrink_page_list() which confused it a lot. It looks like
mark_page_lazyfree() only calls lru_lazyfree_fn() when the pagevec is full
(that is in batches of 14) and never drains the rest (so last four in madvise09
case).

The patch below greatly reduces the failure rate, but doesn't fix it
completely, it still shows up with the same symptoms (hanging trying to access
last 4 pages) after a bunch of retries.

[1] 
https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise09.c
[2] 
https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise09.c#L121

diff --git a/mm/madvise.c b/mm/madvise.c
index 21261ff0466f..a0b868e8b7d2 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -453,6 +453,7 @@ static void madvise_free_page_range(struct mmu_gather *tlb,
 
tlb_start_vma(tlb, vma);
walk_page_range(addr, end, _walk);
+   lru_add_drain();
tlb_end_vma(tlb, vma);
 }
 

-- 
Regards,
  Artem


MADV_FREE is broken

2017-09-20 Thread Artem Savkov
Hi All,

We recently started noticing madvise09[1] test from ltp failing strangely. The
test does the following: maps 32 pages, sets MADV_FREE for the range it got,
dirties 2 of the pages, creates memory pressure and check that nondirty pages
are free. The test hanged while accessing the last 4 pages(29-32) of madvised
range at line 121 [2]. Any other process (gdb/cat) accessing those pages
would also hang as would rebooting the machine. It doesn't trigger any debug
warnings or kasan.

The issue bisected to "802a3a92ad7a mm: reclaim MADV_FREE pages" (so 4.12 and
up are affected).

I did some poking around and found out that the "bad" pages had SwapBacked flag
set in shrink_page_list() which confused it a lot. It looks like
mark_page_lazyfree() only calls lru_lazyfree_fn() when the pagevec is full
(that is in batches of 14) and never drains the rest (so last four in madvise09
case).

The patch below greatly reduces the failure rate, but doesn't fix it
completely, it still shows up with the same symptoms (hanging trying to access
last 4 pages) after a bunch of retries.

[1] 
https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise09.c
[2] 
https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise09.c#L121

diff --git a/mm/madvise.c b/mm/madvise.c
index 21261ff0466f..a0b868e8b7d2 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -453,6 +453,7 @@ static void madvise_free_page_range(struct mmu_gather *tlb,
 
tlb_start_vma(tlb, vma);
walk_page_range(addr, end, _walk);
+   lru_add_drain();
tlb_end_vma(tlb, vma);
 }
 

-- 
Regards,
  Artem


Re: possible circular locking dependency mmap_sem/cpu_hotplug_lock.rw_sem

2017-08-31 Thread Artem Savkov
On Thu, Aug 31, 2017 at 02:09:51PM +0200, Michal Hocko wrote:
> On Thu 31-08-17 13:10:06, Artem Savkov wrote:
> > Hi Michal,
> > 
> > On Wed, Aug 30, 2017 at 05:43:15PM +0200, Michal Hocko wrote:
> > > The previous patch is insufficient. drain_all_stock can still race with
> > > the memory offline callback and the underlying memcg disappear. So we
> > > need to be more careful and pin the css on the memcg. This patch
> > > instead...
> > 
> > Tried this on top of rc7 and it does fix the splat for me.
> 
> Thanks for testing! Can I assume your Tested-by?

Didn't test much more than the case that was causing it, but yes.

Reported-and-tested-by: Artem Savkov <asav...@redhat.com>

-- 
Regards,
  Artem


Re: possible circular locking dependency mmap_sem/cpu_hotplug_lock.rw_sem

2017-08-31 Thread Artem Savkov
On Thu, Aug 31, 2017 at 02:09:51PM +0200, Michal Hocko wrote:
> On Thu 31-08-17 13:10:06, Artem Savkov wrote:
> > Hi Michal,
> > 
> > On Wed, Aug 30, 2017 at 05:43:15PM +0200, Michal Hocko wrote:
> > > The previous patch is insufficient. drain_all_stock can still race with
> > > the memory offline callback and the underlying memcg disappear. So we
> > > need to be more careful and pin the css on the memcg. This patch
> > > instead...
> > 
> > Tried this on top of rc7 and it does fix the splat for me.
> 
> Thanks for testing! Can I assume your Tested-by?

Didn't test much more than the case that was causing it, but yes.

Reported-and-tested-by: Artem Savkov 

-- 
Regards,
  Artem


Re: possible circular locking dependency mmap_sem/cpu_hotplug_lock.rw_sem

2017-08-31 Thread Artem Savkov
Hi Michal,

On Wed, Aug 30, 2017 at 05:43:15PM +0200, Michal Hocko wrote:
> The previous patch is insufficient. drain_all_stock can still race with
> the memory offline callback and the underlying memcg disappear. So we
> need to be more careful and pin the css on the memcg. This patch
> instead...

Tried this on top of rc7 and it does fix the splat for me.

> ---
> From 70a5acf9bbe76d183e81a1a6b57dd5b9edc677c6 Mon Sep 17 00:00:00 2001
> From: Michal Hocko 
> Date: Wed, 30 Aug 2017 16:09:01 +0200
> Subject: [PATCH] mm, memcg: remove hotplug locking from try_charge
> 
> The following lockde splat has been noticed during LTP testing
> 
> [21002.630252] ==
> [21002.637148] WARNING: possible circular locking dependency detected
> [21002.644045] 4.13.0-rc3-next-20170807 #12 Not tainted
> [21002.649583] --
> [21002.656492] a.out/4771 is trying to acquire lock:
> [21002.661742]  (cpu_hotplug_lock.rw_sem){++}, at: [] 
> drain_all_stock.part.35+0x18/0x140
> [21002.672629]
> [21002.672629] but task is already holding lock:
> [21002.679137]  (>mmap_sem){++}, at: [] 
> __do_page_fault+0x175/0x530
> [21002.688371]
> [21002.688371] which lock already depends on the new lock.
> [21002.688371]
> [21002.697505]
> [21002.697505] the existing dependency chain (in reverse order) is:
> [21002.705856]
> [21002.705856] -> #3 (>mmap_sem){++}:
> [21002.712080]lock_acquire+0xc9/0x230
> [21002.716661]__might_fault+0x70/0xa0
> [21002.721241]_copy_to_user+0x23/0x70
> [21002.725814]filldir+0xa7/0x110
> [21002.729988]xfs_dir2_sf_getdents.isra.10+0x20c/0x2c0 [xfs]
> [21002.736840]xfs_readdir+0x1fa/0x2c0 [xfs]
> [21002.742042]xfs_file_readdir+0x30/0x40 [xfs]
> [21002.747485]iterate_dir+0x17a/0x1a0
> [21002.752057]SyS_getdents+0xb0/0x160
> [21002.756638]entry_SYSCALL_64_fastpath+0x1f/0xbe
> [21002.762371]
> [21002.762371] -> #2 (>i_mutex_dir_key#3){++}:
> [21002.769661]lock_acquire+0xc9/0x230
> [21002.774239]down_read+0x51/0xb0
> [21002.778429]lookup_slow+0xde/0x210
> [21002.782903]walk_component+0x160/0x250
> [21002.787765]link_path_walk+0x1a6/0x610
> [21002.792625]path_openat+0xe4/0xd50
> [21002.797100]do_filp_open+0x91/0x100
> [21002.801673]file_open_name+0xf5/0x130
> [21002.806429]filp_open+0x33/0x50
> [21002.810620]kernel_read_file_from_path+0x39/0x80
> [21002.816459]_request_firmware+0x39f/0x880
> [21002.821610]request_firmware_direct+0x37/0x50
> [21002.827151]request_microcode_fw+0x64/0xe0
> [21002.832401]reload_store+0xf7/0x180
> [21002.836974]dev_attr_store+0x18/0x30
> [21002.841641]sysfs_kf_write+0x44/0x60
> [21002.846318]kernfs_fop_write+0x113/0x1a0
> [21002.851374]__vfs_write+0x37/0x170
> [21002.855849]vfs_write+0xc7/0x1c0
> [21002.860128]SyS_write+0x58/0xc0
> [21002.864313]do_syscall_64+0x6c/0x1f0
> [21002.868973]return_from_SYSCALL_64+0x0/0x7a
> [21002.874317]
> [21002.874317] -> #1 (microcode_mutex){+.+.+.}:
> [21002.880748]lock_acquire+0xc9/0x230
> [21002.885322]__mutex_lock+0x88/0x960
> [21002.889894]mutex_lock_nested+0x1b/0x20
> [21002.894854]microcode_init+0xbb/0x208
> [21002.899617]do_one_initcall+0x51/0x1a9
> [21002.904481]kernel_init_freeable+0x208/0x2a7
> [21002.909922]kernel_init+0xe/0x104
> [21002.914298]ret_from_fork+0x2a/0x40
> [21002.918867]
> [21002.918867] -> #0 (cpu_hotplug_lock.rw_sem){++}:
> [21002.926058]__lock_acquire+0x153c/0x1550
> [21002.931112]lock_acquire+0xc9/0x230
> [21002.935688]cpus_read_lock+0x4b/0x90
> [21002.940353]drain_all_stock.part.35+0x18/0x140
> [21002.945987]try_charge+0x3ab/0x6e0
> [21002.950460]mem_cgroup_try_charge+0x7f/0x2c0
> [21002.955902]shmem_getpage_gfp+0x25f/0x1050
> [21002.961149]shmem_fault+0x96/0x200
> [21002.965621]__do_fault+0x1e/0xa0
> [21002.969905]__handle_mm_fault+0x9c3/0xe00
> [21002.975056]handle_mm_fault+0x16e/0x380
> [21002.980013]__do_page_fault+0x24a/0x530
> [21002.984968]do_page_fault+0x30/0x80
> [21002.989537]page_fault+0x28/0x30
> [21002.993812]
> [21002.993812] other info that might help us debug this:
> [21002.993812]
> [21003.002744] Chain exists of:
> [21003.002744]   cpu_hotplug_lock.rw_sem --> >i_mutex_dir_key#3 --> 
> >mmap_sem
> [21003.002744]
> [21003.016238]  Possible unsafe locking scenario:
> [21003.016238]
> [21003.022843]CPU0CPU1
> [21003.027896]
> [21003.032948]   lock(>mmap_sem);
> [21003.036741]lock(>i_mutex_dir_key#3);
> [21003.044419]lock(>mmap_sem);
> 

Re: possible circular locking dependency mmap_sem/cpu_hotplug_lock.rw_sem

2017-08-31 Thread Artem Savkov
Hi Michal,

On Wed, Aug 30, 2017 at 05:43:15PM +0200, Michal Hocko wrote:
> The previous patch is insufficient. drain_all_stock can still race with
> the memory offline callback and the underlying memcg disappear. So we
> need to be more careful and pin the css on the memcg. This patch
> instead...

Tried this on top of rc7 and it does fix the splat for me.

> ---
> From 70a5acf9bbe76d183e81a1a6b57dd5b9edc677c6 Mon Sep 17 00:00:00 2001
> From: Michal Hocko 
> Date: Wed, 30 Aug 2017 16:09:01 +0200
> Subject: [PATCH] mm, memcg: remove hotplug locking from try_charge
> 
> The following lockde splat has been noticed during LTP testing
> 
> [21002.630252] ==
> [21002.637148] WARNING: possible circular locking dependency detected
> [21002.644045] 4.13.0-rc3-next-20170807 #12 Not tainted
> [21002.649583] --
> [21002.656492] a.out/4771 is trying to acquire lock:
> [21002.661742]  (cpu_hotplug_lock.rw_sem){++}, at: [] 
> drain_all_stock.part.35+0x18/0x140
> [21002.672629]
> [21002.672629] but task is already holding lock:
> [21002.679137]  (>mmap_sem){++}, at: [] 
> __do_page_fault+0x175/0x530
> [21002.688371]
> [21002.688371] which lock already depends on the new lock.
> [21002.688371]
> [21002.697505]
> [21002.697505] the existing dependency chain (in reverse order) is:
> [21002.705856]
> [21002.705856] -> #3 (>mmap_sem){++}:
> [21002.712080]lock_acquire+0xc9/0x230
> [21002.716661]__might_fault+0x70/0xa0
> [21002.721241]_copy_to_user+0x23/0x70
> [21002.725814]filldir+0xa7/0x110
> [21002.729988]xfs_dir2_sf_getdents.isra.10+0x20c/0x2c0 [xfs]
> [21002.736840]xfs_readdir+0x1fa/0x2c0 [xfs]
> [21002.742042]xfs_file_readdir+0x30/0x40 [xfs]
> [21002.747485]iterate_dir+0x17a/0x1a0
> [21002.752057]SyS_getdents+0xb0/0x160
> [21002.756638]entry_SYSCALL_64_fastpath+0x1f/0xbe
> [21002.762371]
> [21002.762371] -> #2 (>i_mutex_dir_key#3){++}:
> [21002.769661]lock_acquire+0xc9/0x230
> [21002.774239]down_read+0x51/0xb0
> [21002.778429]lookup_slow+0xde/0x210
> [21002.782903]walk_component+0x160/0x250
> [21002.787765]link_path_walk+0x1a6/0x610
> [21002.792625]path_openat+0xe4/0xd50
> [21002.797100]do_filp_open+0x91/0x100
> [21002.801673]file_open_name+0xf5/0x130
> [21002.806429]filp_open+0x33/0x50
> [21002.810620]kernel_read_file_from_path+0x39/0x80
> [21002.816459]_request_firmware+0x39f/0x880
> [21002.821610]request_firmware_direct+0x37/0x50
> [21002.827151]request_microcode_fw+0x64/0xe0
> [21002.832401]reload_store+0xf7/0x180
> [21002.836974]dev_attr_store+0x18/0x30
> [21002.841641]sysfs_kf_write+0x44/0x60
> [21002.846318]kernfs_fop_write+0x113/0x1a0
> [21002.851374]__vfs_write+0x37/0x170
> [21002.855849]vfs_write+0xc7/0x1c0
> [21002.860128]SyS_write+0x58/0xc0
> [21002.864313]do_syscall_64+0x6c/0x1f0
> [21002.868973]return_from_SYSCALL_64+0x0/0x7a
> [21002.874317]
> [21002.874317] -> #1 (microcode_mutex){+.+.+.}:
> [21002.880748]lock_acquire+0xc9/0x230
> [21002.885322]__mutex_lock+0x88/0x960
> [21002.889894]mutex_lock_nested+0x1b/0x20
> [21002.894854]microcode_init+0xbb/0x208
> [21002.899617]do_one_initcall+0x51/0x1a9
> [21002.904481]kernel_init_freeable+0x208/0x2a7
> [21002.909922]kernel_init+0xe/0x104
> [21002.914298]ret_from_fork+0x2a/0x40
> [21002.918867]
> [21002.918867] -> #0 (cpu_hotplug_lock.rw_sem){++}:
> [21002.926058]__lock_acquire+0x153c/0x1550
> [21002.931112]lock_acquire+0xc9/0x230
> [21002.935688]cpus_read_lock+0x4b/0x90
> [21002.940353]drain_all_stock.part.35+0x18/0x140
> [21002.945987]try_charge+0x3ab/0x6e0
> [21002.950460]mem_cgroup_try_charge+0x7f/0x2c0
> [21002.955902]shmem_getpage_gfp+0x25f/0x1050
> [21002.961149]shmem_fault+0x96/0x200
> [21002.965621]__do_fault+0x1e/0xa0
> [21002.969905]__handle_mm_fault+0x9c3/0xe00
> [21002.975056]handle_mm_fault+0x16e/0x380
> [21002.980013]__do_page_fault+0x24a/0x530
> [21002.984968]do_page_fault+0x30/0x80
> [21002.989537]page_fault+0x28/0x30
> [21002.993812]
> [21002.993812] other info that might help us debug this:
> [21002.993812]
> [21003.002744] Chain exists of:
> [21003.002744]   cpu_hotplug_lock.rw_sem --> >i_mutex_dir_key#3 --> 
> >mmap_sem
> [21003.002744]
> [21003.016238]  Possible unsafe locking scenario:
> [21003.016238]
> [21003.022843]CPU0CPU1
> [21003.027896]
> [21003.032948]   lock(>mmap_sem);
> [21003.036741]lock(>i_mutex_dir_key#3);
> [21003.044419]lock(>mmap_sem);
> [21003.051025]   

Re: possible circular locking dependency mmap_sem/cpu_hotplug_lock.rw_sem

2017-08-16 Thread Artem Savkov
On Wed, Aug 16, 2017 at 03:39:14PM +0200, Laurent Dufour wrote:
> On 15/08/2017 21:01, Paul E. McKenney wrote:
> > On Mon, Aug 07, 2017 at 04:09:47PM +0200, Artem Savkov wrote:
> >> Hello,
> >>
> >> After commit fc8dffd "cpu/hotplug: Convert hotplug locking to percpu rwsem"
> >> the following lockdep splat started showing up on some systems while 
> >> running
> >> ltp's madvise06 test (right after first dirty_pages call [1]).
> > 
> > Hello, Artem,
> > 
> > Have you tried running this with Laurent Dufour's speculative page-fault
> > patch set?  https://lwn.net/Articles/730160/
> 
> Hello Artem, Hello Paul,
> 
> This would be a good idea to give it a try, but I don't think this will
> help here as the speculative page fault handler is aborted if vma->ops is
> set as it is the case in the following stack trace of the CPU #0.
> 
> This being said, the speculative page fault handler may also failed due to
> VMA's changes occurring in our back. In such a case, the legacy page fault
> handler is then tried and such a lock dependency is expected to raise again.

I've tried with the patch set on top of rc5 and the warning is intact
except for cpu_hotplug_lock path having an extra call to
handle_pte_fault() between handle_mm_fault() and __do_fault().

[   32.036924] -> #0 (cpu_hotplug_lock.rw_sem){++}:
[   32.037628]__lock_acquire+0x153c/0x1550
[   32.038231]lock_acquire+0xc9/0x230
[   32.038740]cpus_read_lock+0x4b/0x90
[   32.039292]drain_all_stock.part.33+0x18/0x140
[   32.039815]try_charge+0x3ab/0x6e0
[   32.040273]mem_cgroup_try_charge+0x82/0x330
[   32.040845]shmem_getpage_gfp+0x268/0x1050
[   32.041421]shmem_fault+0x96/0x200
[   32.041955]__do_fault+0x1e/0xa0
[   32.042354]handle_pte_fault+0x4bd/0x980
[   32.042923]__handle_mm_fault+0x21b/0x520
[   32.043497]handle_mm_fault+0x16e/0x380
[   32.044070]__do_page_fault+0x3fe/0x5a0
[   32.044532]do_page_fault+0x30/0x80
[   32.045004]page_fault+0x28/0x30


> 
> Cheers,
> Laurent.
> 
> > 
> >> [1] 
> >> https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise06.c#L136
> >>
> >> [21002.630252] ==
> >> [21002.637148] WARNING: possible circular locking dependency detected
> >> [21002.644045] 4.13.0-rc3-next-20170807 #12 Not tainted
> >> [21002.649583] --
> >> [21002.656492] a.out/4771 is trying to acquire lock:
> >> [21002.661742]  (cpu_hotplug_lock.rw_sem){++}, at: 
> >> [] drain_all_stock.part.35+0x18/0x140
> >> [21002.672629] 
> >> [21002.672629] but task is already holding lock:
> >> [21002.679137]  (>mmap_sem){++}, at: [] 
> >> __do_page_fault+0x175/0x530
> >> [21002.688371] 
> >> [21002.688371] which lock already depends on the new lock.
> >> [21002.688371] 
> >> [21002.697505] 
> >> [21002.697505] the existing dependency chain (in reverse order) is:
> >> [21002.705856] 
> >> [21002.705856] -> #3 (>mmap_sem){++}:
> >> [21002.712080]lock_acquire+0xc9/0x230
> >> [21002.716661]__might_fault+0x70/0xa0
> >> [21002.721241]_copy_to_user+0x23/0x70
> >> [21002.725814]filldir+0xa7/0x110
> >> [21002.729988]xfs_dir2_sf_getdents.isra.10+0x20c/0x2c0 [xfs]
> >> [21002.736840]xfs_readdir+0x1fa/0x2c0 [xfs]
> >> [21002.742042]xfs_file_readdir+0x30/0x40 [xfs]
> >> [21002.747485]iterate_dir+0x17a/0x1a0
> >> [21002.752057]SyS_getdents+0xb0/0x160
> >> [21002.756638]entry_SYSCALL_64_fastpath+0x1f/0xbe
> >> [21002.762371] 
> >> [21002.762371] -> #2 (>i_mutex_dir_key#3){++}:
> >> [21002.769661]lock_acquire+0xc9/0x230
> >> [21002.774239]down_read+0x51/0xb0
> >> [21002.778429]lookup_slow+0xde/0x210
> >> [21002.782903]walk_component+0x160/0x250
> >> [21002.787765]link_path_walk+0x1a6/0x610
> >> [21002.792625]path_openat+0xe4/0xd50
> >> [21002.797100]do_filp_open+0x91/0x100
> >> [21002.801673]file_open_name+0xf5/0x130
> >> [21002.806429]filp_open+0x33/0x50
> >> [21002.810620]kernel_read_file_from_path+0x39/0x80
> >> [21002.816459]_request_firmware+0x39f/0x880
> >> [21002.821610]request_firmware_direct+0x37/0x50
> >> [21002.827151]request_microcode_fw+0x64

Re: possible circular locking dependency mmap_sem/cpu_hotplug_lock.rw_sem

2017-08-16 Thread Artem Savkov
On Wed, Aug 16, 2017 at 03:39:14PM +0200, Laurent Dufour wrote:
> On 15/08/2017 21:01, Paul E. McKenney wrote:
> > On Mon, Aug 07, 2017 at 04:09:47PM +0200, Artem Savkov wrote:
> >> Hello,
> >>
> >> After commit fc8dffd "cpu/hotplug: Convert hotplug locking to percpu rwsem"
> >> the following lockdep splat started showing up on some systems while 
> >> running
> >> ltp's madvise06 test (right after first dirty_pages call [1]).
> > 
> > Hello, Artem,
> > 
> > Have you tried running this with Laurent Dufour's speculative page-fault
> > patch set?  https://lwn.net/Articles/730160/
> 
> Hello Artem, Hello Paul,
> 
> This would be a good idea to give it a try, but I don't think this will
> help here as the speculative page fault handler is aborted if vma->ops is
> set as it is the case in the following stack trace of the CPU #0.
> 
> This being said, the speculative page fault handler may also failed due to
> VMA's changes occurring in our back. In such a case, the legacy page fault
> handler is then tried and such a lock dependency is expected to raise again.

I've tried with the patch set on top of rc5 and the warning is intact
except for cpu_hotplug_lock path having an extra call to
handle_pte_fault() between handle_mm_fault() and __do_fault().

[   32.036924] -> #0 (cpu_hotplug_lock.rw_sem){++}:
[   32.037628]__lock_acquire+0x153c/0x1550
[   32.038231]lock_acquire+0xc9/0x230
[   32.038740]cpus_read_lock+0x4b/0x90
[   32.039292]drain_all_stock.part.33+0x18/0x140
[   32.039815]try_charge+0x3ab/0x6e0
[   32.040273]mem_cgroup_try_charge+0x82/0x330
[   32.040845]shmem_getpage_gfp+0x268/0x1050
[   32.041421]shmem_fault+0x96/0x200
[   32.041955]__do_fault+0x1e/0xa0
[   32.042354]handle_pte_fault+0x4bd/0x980
[   32.042923]__handle_mm_fault+0x21b/0x520
[   32.043497]handle_mm_fault+0x16e/0x380
[   32.044070]__do_page_fault+0x3fe/0x5a0
[   32.044532]do_page_fault+0x30/0x80
[   32.045004]page_fault+0x28/0x30


> 
> Cheers,
> Laurent.
> 
> > 
> >> [1] 
> >> https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise06.c#L136
> >>
> >> [21002.630252] ==
> >> [21002.637148] WARNING: possible circular locking dependency detected
> >> [21002.644045] 4.13.0-rc3-next-20170807 #12 Not tainted
> >> [21002.649583] --
> >> [21002.656492] a.out/4771 is trying to acquire lock:
> >> [21002.661742]  (cpu_hotplug_lock.rw_sem){++}, at: 
> >> [] drain_all_stock.part.35+0x18/0x140
> >> [21002.672629] 
> >> [21002.672629] but task is already holding lock:
> >> [21002.679137]  (>mmap_sem){++}, at: [] 
> >> __do_page_fault+0x175/0x530
> >> [21002.688371] 
> >> [21002.688371] which lock already depends on the new lock.
> >> [21002.688371] 
> >> [21002.697505] 
> >> [21002.697505] the existing dependency chain (in reverse order) is:
> >> [21002.705856] 
> >> [21002.705856] -> #3 (>mmap_sem){++}:
> >> [21002.712080]lock_acquire+0xc9/0x230
> >> [21002.716661]__might_fault+0x70/0xa0
> >> [21002.721241]_copy_to_user+0x23/0x70
> >> [21002.725814]filldir+0xa7/0x110
> >> [21002.729988]xfs_dir2_sf_getdents.isra.10+0x20c/0x2c0 [xfs]
> >> [21002.736840]xfs_readdir+0x1fa/0x2c0 [xfs]
> >> [21002.742042]xfs_file_readdir+0x30/0x40 [xfs]
> >> [21002.747485]iterate_dir+0x17a/0x1a0
> >> [21002.752057]SyS_getdents+0xb0/0x160
> >> [21002.756638]entry_SYSCALL_64_fastpath+0x1f/0xbe
> >> [21002.762371] 
> >> [21002.762371] -> #2 (>i_mutex_dir_key#3){++}:
> >> [21002.769661]lock_acquire+0xc9/0x230
> >> [21002.774239]down_read+0x51/0xb0
> >> [21002.778429]lookup_slow+0xde/0x210
> >> [21002.782903]walk_component+0x160/0x250
> >> [21002.787765]link_path_walk+0x1a6/0x610
> >> [21002.792625]path_openat+0xe4/0xd50
> >> [21002.797100]do_filp_open+0x91/0x100
> >> [21002.801673]file_open_name+0xf5/0x130
> >> [21002.806429]filp_open+0x33/0x50
> >> [21002.810620]kernel_read_file_from_path+0x39/0x80
> >> [21002.816459]_request_firmware+0x39f/0x880
> >> [21002.821610]request_firmware_direct+0x37/0x50
> >> [21002.827151]request_microcode_fw+0x64

[PATCH v2] iommu/arm-smmu: fix null-pointer dereference in arm_smmu_add_device

2017-08-08 Thread Artem Savkov
Commit c54451a "iommu/arm-smmu: Fix the error path in arm_smmu_add_device"
removed fwspec assignment in legacy_binding path as redundant which is
wrong. It needs to be updated after fwspec initialisation in
arm_smmu_register_legacy_master() as it is dereferenced later. Without
this there is a NULL-pointer dereference panic during boot on some hosts.

Signed-off-by: Artem Savkov <asav...@redhat.com>
---
 drivers/iommu/arm-smmu.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index b97188a..2d80fa8 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1519,6 +1519,13 @@ static int arm_smmu_add_device(struct device *dev)
 
if (using_legacy_binding) {
ret = arm_smmu_register_legacy_master(dev, );
+
+   /*
+* If dev->iommu_fwspec is initally NULL, 
arm_smmu_register_legacy_master()
+* will allocate/initialise a new one. Thus we need to update 
fwspec for
+* later use.
+*/
+   fwspec = dev->iommu_fwspec;
if (ret)
goto out_free;
} else if (fwspec && fwspec->ops == _smmu_ops) {
-- 
2.7.5



[PATCH v2] iommu/arm-smmu: fix null-pointer dereference in arm_smmu_add_device

2017-08-08 Thread Artem Savkov
Commit c54451a "iommu/arm-smmu: Fix the error path in arm_smmu_add_device"
removed fwspec assignment in legacy_binding path as redundant which is
wrong. It needs to be updated after fwspec initialisation in
arm_smmu_register_legacy_master() as it is dereferenced later. Without
this there is a NULL-pointer dereference panic during boot on some hosts.

Signed-off-by: Artem Savkov 
---
 drivers/iommu/arm-smmu.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index b97188a..2d80fa8 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1519,6 +1519,13 @@ static int arm_smmu_add_device(struct device *dev)
 
if (using_legacy_binding) {
ret = arm_smmu_register_legacy_master(dev, );
+
+   /*
+* If dev->iommu_fwspec is initally NULL, 
arm_smmu_register_legacy_master()
+* will allocate/initialise a new one. Thus we need to update 
fwspec for
+* later use.
+*/
+   fwspec = dev->iommu_fwspec;
if (ret)
goto out_free;
} else if (fwspec && fwspec->ops == _smmu_ops) {
-- 
2.7.5



[PATCH] iommu/arm-smmu: fix null-pointer dereference in arm_smmu_add_device

2017-08-08 Thread Artem Savkov
Commit c54451a "iommu/arm-smmu: Fix the error path in arm_smmu_add_device"
removed fwspec assignment in legacy_binding path as redundant which is
wrong. It needs to be updated after fwspec initialisation in
arm_smmu_register_legacy_master() as it is dereferenced later. Without
this there is a NULL-pointer dereference panic during boot on some hosts.

Signed-off-by: Artem Savkov <asav...@redhat.com>
---
 drivers/iommu/arm-smmu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index b97188a..95f1c86 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1519,6 +1519,7 @@ static int arm_smmu_add_device(struct device *dev)
 
if (using_legacy_binding) {
ret = arm_smmu_register_legacy_master(dev, );
+   fwspec = dev->iommu_fwspec;
if (ret)
goto out_free;
} else if (fwspec && fwspec->ops == _smmu_ops) {
-- 
2.7.5



[PATCH] iommu/arm-smmu: fix null-pointer dereference in arm_smmu_add_device

2017-08-08 Thread Artem Savkov
Commit c54451a "iommu/arm-smmu: Fix the error path in arm_smmu_add_device"
removed fwspec assignment in legacy_binding path as redundant which is
wrong. It needs to be updated after fwspec initialisation in
arm_smmu_register_legacy_master() as it is dereferenced later. Without
this there is a NULL-pointer dereference panic during boot on some hosts.

Signed-off-by: Artem Savkov 
---
 drivers/iommu/arm-smmu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index b97188a..95f1c86 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1519,6 +1519,7 @@ static int arm_smmu_add_device(struct device *dev)
 
if (using_legacy_binding) {
ret = arm_smmu_register_legacy_master(dev, );
+   fwspec = dev->iommu_fwspec;
if (ret)
goto out_free;
} else if (fwspec && fwspec->ops == _smmu_ops) {
-- 
2.7.5



possible circular locking dependency mmap_sem/cpu_hotplug_lock.rw_sem

2017-08-07 Thread Artem Savkov
Hello,

After commit fc8dffd "cpu/hotplug: Convert hotplug locking to percpu rwsem"
the following lockdep splat started showing up on some systems while running
ltp's madvise06 test (right after first dirty_pages call [1]).

[1] 
https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise06.c#L136

[21002.630252] ==
[21002.637148] WARNING: possible circular locking dependency detected
[21002.644045] 4.13.0-rc3-next-20170807 #12 Not tainted
[21002.649583] --
[21002.656492] a.out/4771 is trying to acquire lock:
[21002.661742]  (cpu_hotplug_lock.rw_sem){++}, at: [] 
drain_all_stock.part.35+0x18/0x140
[21002.672629] 
[21002.672629] but task is already holding lock:
[21002.679137]  (>mmap_sem){++}, at: [] 
__do_page_fault+0x175/0x530
[21002.688371] 
[21002.688371] which lock already depends on the new lock.
[21002.688371] 
[21002.697505] 
[21002.697505] the existing dependency chain (in reverse order) is:
[21002.705856] 
[21002.705856] -> #3 (>mmap_sem){++}:
[21002.712080]lock_acquire+0xc9/0x230
[21002.716661]__might_fault+0x70/0xa0
[21002.721241]_copy_to_user+0x23/0x70
[21002.725814]filldir+0xa7/0x110
[21002.729988]xfs_dir2_sf_getdents.isra.10+0x20c/0x2c0 [xfs]
[21002.736840]xfs_readdir+0x1fa/0x2c0 [xfs]
[21002.742042]xfs_file_readdir+0x30/0x40 [xfs]
[21002.747485]iterate_dir+0x17a/0x1a0
[21002.752057]SyS_getdents+0xb0/0x160
[21002.756638]entry_SYSCALL_64_fastpath+0x1f/0xbe
[21002.762371] 
[21002.762371] -> #2 (>i_mutex_dir_key#3){++}:
[21002.769661]lock_acquire+0xc9/0x230
[21002.774239]down_read+0x51/0xb0
[21002.778429]lookup_slow+0xde/0x210
[21002.782903]walk_component+0x160/0x250
[21002.787765]link_path_walk+0x1a6/0x610
[21002.792625]path_openat+0xe4/0xd50
[21002.797100]do_filp_open+0x91/0x100
[21002.801673]file_open_name+0xf5/0x130
[21002.806429]filp_open+0x33/0x50
[21002.810620]kernel_read_file_from_path+0x39/0x80
[21002.816459]_request_firmware+0x39f/0x880
[21002.821610]request_firmware_direct+0x37/0x50
[21002.827151]request_microcode_fw+0x64/0xe0
[21002.832401]reload_store+0xf7/0x180
[21002.836974]dev_attr_store+0x18/0x30
[21002.841641]sysfs_kf_write+0x44/0x60
[21002.846318]kernfs_fop_write+0x113/0x1a0
[21002.851374]__vfs_write+0x37/0x170
[21002.855849]vfs_write+0xc7/0x1c0
[21002.860128]SyS_write+0x58/0xc0
[21002.864313]do_syscall_64+0x6c/0x1f0
[21002.868973]return_from_SYSCALL_64+0x0/0x7a
[21002.874317] 
[21002.874317] -> #1 (microcode_mutex){+.+.+.}:
[21002.880748]lock_acquire+0xc9/0x230
[21002.885322]__mutex_lock+0x88/0x960
[21002.889894]mutex_lock_nested+0x1b/0x20
[21002.894854]microcode_init+0xbb/0x208
[21002.899617]do_one_initcall+0x51/0x1a9
[21002.904481]kernel_init_freeable+0x208/0x2a7
[21002.909922]kernel_init+0xe/0x104
[21002.914298]ret_from_fork+0x2a/0x40
[21002.918867] 
[21002.918867] -> #0 (cpu_hotplug_lock.rw_sem){++}:
[21002.926058]__lock_acquire+0x153c/0x1550
[21002.931112]lock_acquire+0xc9/0x230
[21002.935688]cpus_read_lock+0x4b/0x90
[21002.940353]drain_all_stock.part.35+0x18/0x140
[21002.945987]try_charge+0x3ab/0x6e0
[21002.950460]mem_cgroup_try_charge+0x7f/0x2c0
[21002.955902]shmem_getpage_gfp+0x25f/0x1050
[21002.961149]shmem_fault+0x96/0x200
[21002.965621]__do_fault+0x1e/0xa0
[21002.969905]__handle_mm_fault+0x9c3/0xe00
[21002.975056]handle_mm_fault+0x16e/0x380
[21002.980013]__do_page_fault+0x24a/0x530
[21002.984968]do_page_fault+0x30/0x80
[21002.989537]page_fault+0x28/0x30
[21002.993812] 
[21002.993812] other info that might help us debug this:
[21002.993812] 
[21003.002744] Chain exists of:
[21003.002744]   cpu_hotplug_lock.rw_sem --> >i_mutex_dir_key#3 --> 
>mmap_sem
[21003.002744] 
[21003.016238]  Possible unsafe locking scenario:
[21003.016238] 
[21003.022843]CPU0CPU1
[21003.027896]
[21003.032948]   lock(>mmap_sem);
[21003.036741]lock(>i_mutex_dir_key#3);
[21003.044419]lock(>mmap_sem);
[21003.051025]   lock(cpu_hotplug_lock.rw_sem);
[21003.055788] 
[21003.055788]  *** DEADLOCK ***
[21003.055788] 
[21003.062393] 2 locks held by a.out/4771:
[21003.066675]  #0:  (>mmap_sem){++}, at: [] 
__do_page_fault+0x175/0x530
[21003.076391]  #1:  (percpu_charge_mutex){+.+...}, at: [] 
try_charge+0x397/0x6e0
[21003.086198] 
[21003.086198] stack backtrace:
[21003.091059] CPU: 6 PID: 4771 Comm: a.out Not tainted 
4.13.0-rc3-next-20170807 #12
[21003.099409] Hardware name: Dell Inc. 

possible circular locking dependency mmap_sem/cpu_hotplug_lock.rw_sem

2017-08-07 Thread Artem Savkov
Hello,

After commit fc8dffd "cpu/hotplug: Convert hotplug locking to percpu rwsem"
the following lockdep splat started showing up on some systems while running
ltp's madvise06 test (right after first dirty_pages call [1]).

[1] 
https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise06.c#L136

[21002.630252] ==
[21002.637148] WARNING: possible circular locking dependency detected
[21002.644045] 4.13.0-rc3-next-20170807 #12 Not tainted
[21002.649583] --
[21002.656492] a.out/4771 is trying to acquire lock:
[21002.661742]  (cpu_hotplug_lock.rw_sem){++}, at: [] 
drain_all_stock.part.35+0x18/0x140
[21002.672629] 
[21002.672629] but task is already holding lock:
[21002.679137]  (>mmap_sem){++}, at: [] 
__do_page_fault+0x175/0x530
[21002.688371] 
[21002.688371] which lock already depends on the new lock.
[21002.688371] 
[21002.697505] 
[21002.697505] the existing dependency chain (in reverse order) is:
[21002.705856] 
[21002.705856] -> #3 (>mmap_sem){++}:
[21002.712080]lock_acquire+0xc9/0x230
[21002.716661]__might_fault+0x70/0xa0
[21002.721241]_copy_to_user+0x23/0x70
[21002.725814]filldir+0xa7/0x110
[21002.729988]xfs_dir2_sf_getdents.isra.10+0x20c/0x2c0 [xfs]
[21002.736840]xfs_readdir+0x1fa/0x2c0 [xfs]
[21002.742042]xfs_file_readdir+0x30/0x40 [xfs]
[21002.747485]iterate_dir+0x17a/0x1a0
[21002.752057]SyS_getdents+0xb0/0x160
[21002.756638]entry_SYSCALL_64_fastpath+0x1f/0xbe
[21002.762371] 
[21002.762371] -> #2 (>i_mutex_dir_key#3){++}:
[21002.769661]lock_acquire+0xc9/0x230
[21002.774239]down_read+0x51/0xb0
[21002.778429]lookup_slow+0xde/0x210
[21002.782903]walk_component+0x160/0x250
[21002.787765]link_path_walk+0x1a6/0x610
[21002.792625]path_openat+0xe4/0xd50
[21002.797100]do_filp_open+0x91/0x100
[21002.801673]file_open_name+0xf5/0x130
[21002.806429]filp_open+0x33/0x50
[21002.810620]kernel_read_file_from_path+0x39/0x80
[21002.816459]_request_firmware+0x39f/0x880
[21002.821610]request_firmware_direct+0x37/0x50
[21002.827151]request_microcode_fw+0x64/0xe0
[21002.832401]reload_store+0xf7/0x180
[21002.836974]dev_attr_store+0x18/0x30
[21002.841641]sysfs_kf_write+0x44/0x60
[21002.846318]kernfs_fop_write+0x113/0x1a0
[21002.851374]__vfs_write+0x37/0x170
[21002.855849]vfs_write+0xc7/0x1c0
[21002.860128]SyS_write+0x58/0xc0
[21002.864313]do_syscall_64+0x6c/0x1f0
[21002.868973]return_from_SYSCALL_64+0x0/0x7a
[21002.874317] 
[21002.874317] -> #1 (microcode_mutex){+.+.+.}:
[21002.880748]lock_acquire+0xc9/0x230
[21002.885322]__mutex_lock+0x88/0x960
[21002.889894]mutex_lock_nested+0x1b/0x20
[21002.894854]microcode_init+0xbb/0x208
[21002.899617]do_one_initcall+0x51/0x1a9
[21002.904481]kernel_init_freeable+0x208/0x2a7
[21002.909922]kernel_init+0xe/0x104
[21002.914298]ret_from_fork+0x2a/0x40
[21002.918867] 
[21002.918867] -> #0 (cpu_hotplug_lock.rw_sem){++}:
[21002.926058]__lock_acquire+0x153c/0x1550
[21002.931112]lock_acquire+0xc9/0x230
[21002.935688]cpus_read_lock+0x4b/0x90
[21002.940353]drain_all_stock.part.35+0x18/0x140
[21002.945987]try_charge+0x3ab/0x6e0
[21002.950460]mem_cgroup_try_charge+0x7f/0x2c0
[21002.955902]shmem_getpage_gfp+0x25f/0x1050
[21002.961149]shmem_fault+0x96/0x200
[21002.965621]__do_fault+0x1e/0xa0
[21002.969905]__handle_mm_fault+0x9c3/0xe00
[21002.975056]handle_mm_fault+0x16e/0x380
[21002.980013]__do_page_fault+0x24a/0x530
[21002.984968]do_page_fault+0x30/0x80
[21002.989537]page_fault+0x28/0x30
[21002.993812] 
[21002.993812] other info that might help us debug this:
[21002.993812] 
[21003.002744] Chain exists of:
[21003.002744]   cpu_hotplug_lock.rw_sem --> >i_mutex_dir_key#3 --> 
>mmap_sem
[21003.002744] 
[21003.016238]  Possible unsafe locking scenario:
[21003.016238] 
[21003.022843]CPU0CPU1
[21003.027896]
[21003.032948]   lock(>mmap_sem);
[21003.036741]lock(>i_mutex_dir_key#3);
[21003.044419]lock(>mmap_sem);
[21003.051025]   lock(cpu_hotplug_lock.rw_sem);
[21003.055788] 
[21003.055788]  *** DEADLOCK ***
[21003.055788] 
[21003.062393] 2 locks held by a.out/4771:
[21003.066675]  #0:  (>mmap_sem){++}, at: [] 
__do_page_fault+0x175/0x530
[21003.076391]  #1:  (percpu_charge_mutex){+.+...}, at: [] 
try_charge+0x397/0x6e0
[21003.086198] 
[21003.086198] stack backtrace:
[21003.091059] CPU: 6 PID: 4771 Comm: a.out Not tainted 
4.13.0-rc3-next-20170807 #12
[21003.099409] Hardware name: Dell Inc. 

Re: [PATCH] iommu/amd: Fix schedule-while-atomic BUG in initialization code

2017-07-26 Thread Artem Savkov
On Wed, Jul 26, 2017 at 02:26:14PM +0200, Joerg Roedel wrote:
> Hi Artem, Thomas,
> 
> On Wed, Jul 26, 2017 at 12:42:49PM +0200, Thomas Gleixner wrote:
> > On Tue, 25 Jul 2017, Artem Savkov wrote:
> > 
> > > Hi,
> > > 
> > > Commit 1c3c5ea "sched/core: Enable might_sleep() and smp_processor_id()
> > > checks early" seem to have uncovered an issue with amd-iommu/x2apic.
> > > 
> > > Starting with that commit the following warning started to show up on AMD
> > > systems during boot:
> >  
> > > [0.16] BUG: sleeping function called from invalid context at 
> > > kernel/locking/mutex.c:747 
> > 
> > > [0.16]  mutex_lock_nested+0x1b/0x20 
> > > [0.16]  register_syscore_ops+0x1d/0x70 
> > > [0.16]  state_next+0x119/0x910 
> > > [0.16]  iommu_go_to_state+0x29/0x30 
> > > [0.16]  amd_iommu_enable+0x13/0x23 
> > > [0.16]  irq_remapping_enable+0x1b/0x39 
> > > [0.16]  enable_IR_x2apic+0x91/0x196 
> > > [0.16]  default_setup_apic_routing+0x16/0x6e 
> > > [0.16]  native_smp_prepare_cpus+0x257/0x2d5
> 
> Thanks for the report!
> 
> > --- a/drivers/iommu/amd_iommu_init.c
> > +++ b/drivers/iommu/amd_iommu_init.c
> > @@ -2440,7 +2440,6 @@ static int __init state_next(void)
> > break;
> > case IOMMU_ACPI_FINISHED:
> > early_enable_iommus();
> > -   register_syscore_ops(_iommu_syscore_ops);
> > x86_platform.iommu_shutdown = disable_iommus;
> > init_state = IOMMU_ENABLED;
> > break;
> > @@ -2559,6 +2558,8 @@ static int __init amd_iommu_init(void)
> > for_each_iommu(iommu)
> > iommu_flush_all_caches(iommu);
> > }
> > +   } else {
> > +   register_syscore_ops(_iommu_syscore_ops);
> > }
> >  
> >     return ret;
> 
> Yes, that should fix it, but I think its better to just move the
> register_syscore_ops() call to a later initialization step, like in the
> patch below. I tested it an will queue it to my iommu/fixes branch.

Checked it as well just in case, didn't see any issues. Thank you.

Reported-and-tested-by: Artem Savkov <asav...@redhat.com>

-- 
Regards,
  Artem


Re: [PATCH] iommu/amd: Fix schedule-while-atomic BUG in initialization code

2017-07-26 Thread Artem Savkov
On Wed, Jul 26, 2017 at 02:26:14PM +0200, Joerg Roedel wrote:
> Hi Artem, Thomas,
> 
> On Wed, Jul 26, 2017 at 12:42:49PM +0200, Thomas Gleixner wrote:
> > On Tue, 25 Jul 2017, Artem Savkov wrote:
> > 
> > > Hi,
> > > 
> > > Commit 1c3c5ea "sched/core: Enable might_sleep() and smp_processor_id()
> > > checks early" seem to have uncovered an issue with amd-iommu/x2apic.
> > > 
> > > Starting with that commit the following warning started to show up on AMD
> > > systems during boot:
> >  
> > > [0.16] BUG: sleeping function called from invalid context at 
> > > kernel/locking/mutex.c:747 
> > 
> > > [0.16]  mutex_lock_nested+0x1b/0x20 
> > > [0.16]  register_syscore_ops+0x1d/0x70 
> > > [0.16]  state_next+0x119/0x910 
> > > [0.16]  iommu_go_to_state+0x29/0x30 
> > > [0.16]  amd_iommu_enable+0x13/0x23 
> > > [0.16]  irq_remapping_enable+0x1b/0x39 
> > > [0.16]  enable_IR_x2apic+0x91/0x196 
> > > [0.16]  default_setup_apic_routing+0x16/0x6e 
> > > [0.16]  native_smp_prepare_cpus+0x257/0x2d5
> 
> Thanks for the report!
> 
> > --- a/drivers/iommu/amd_iommu_init.c
> > +++ b/drivers/iommu/amd_iommu_init.c
> > @@ -2440,7 +2440,6 @@ static int __init state_next(void)
> > break;
> > case IOMMU_ACPI_FINISHED:
> > early_enable_iommus();
> > -   register_syscore_ops(_iommu_syscore_ops);
> > x86_platform.iommu_shutdown = disable_iommus;
> > init_state = IOMMU_ENABLED;
> > break;
> > @@ -2559,6 +2558,8 @@ static int __init amd_iommu_init(void)
> > for_each_iommu(iommu)
> > iommu_flush_all_caches(iommu);
> > }
> > +   } else {
> > +   register_syscore_ops(_iommu_syscore_ops);
> > }
> >  
> >     return ret;
> 
> Yes, that should fix it, but I think its better to just move the
> register_syscore_ops() call to a later initialization step, like in the
> patch below. I tested it an will queue it to my iommu/fixes branch.

Checked it as well just in case, didn't see any issues. Thank you.

Reported-and-tested-by: Artem Savkov 

-- 
Regards,
  Artem


amd-iommu/x2apic: sleeping function called from invalid context

2017-07-25 Thread Artem Savkov
Hi,

Commit 1c3c5ea "sched/core: Enable might_sleep() and smp_processor_id()
checks early" seem to have uncovered an issue with amd-iommu/x2apic.

Starting with that commit the following warning started to show up on AMD
systems during boot:

[0.140480] smpboot: Max logical packages: 6 
[0.16] BUG: sleeping function called from invalid context at 
kernel/locking/mutex.c:747 
[0.16] in_atomic(): 0, irqs_disabled(): 1, pid: 1, name: swapper/0 
[0.16] no locks held by swapper/0/1. 
[0.16] irq event stamp: 304 
[0.16] hardirqs last  enabled at (303): [] 
_raw_spin_unlock_irqrestore+0x36/0x60 
[0.16] hardirqs last disabled at (304): [] 
enable_IR_x2apic+0x79/0x196 
[0.16] softirqs last  enabled at (36): [] 
__do_softirq+0x35f/0x4ec 
[0.16] softirqs last disabled at (31): [] 
irq_exit+0x105/0x120 
[0.16] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
4.13.0-rc2.1.el7a.test.x86_64.debug #1 
[0.16] Hardware name:  PowerEdge C6145 /040N24, BIOS 3.5.0 
10/28/2014 
[0.16] Call Trace: 
[0.16]  dump_stack+0x85/0xca 
[0.16]  ___might_sleep+0x22a/0x260 
[0.16]  __might_sleep+0x4a/0x80 
[0.16]  __mutex_lock+0x58/0x960 
[0.16]  ? iommu_completion_wait.part.17+0xb5/0x160 
[0.16]  ? register_syscore_ops+0x1d/0x70 
[0.16]  ? iommu_flush_all_caches+0x120/0x150 
[0.16]  mutex_lock_nested+0x1b/0x20 
[0.16]  register_syscore_ops+0x1d/0x70 
[0.16]  state_next+0x119/0x910 
[0.16]  iommu_go_to_state+0x29/0x30 
[0.16]  amd_iommu_enable+0x13/0x23 
[0.16]  irq_remapping_enable+0x1b/0x39 
[0.16]  enable_IR_x2apic+0x91/0x196 
[0.16]  default_setup_apic_routing+0x16/0x6e 
[0.16]  native_smp_prepare_cpus+0x257/0x2d5 
[0.16]  kernel_init_freeable+0x131/0x2a7 
[0.16]  ? kernel_init+0xe/0x104 
[0.16]  ? _raw_spin_unlock_irq+0x2c/0x40 
[0.16]  ? rest_init+0xe0/0xe0 
[0.16]  kernel_init+0xe/0x104 
[0.16]  ret_from_fork+0x2a/0x40 
[0.160010] Switched APIC routing to physical flat. 

-- 
Regards,
  Artem


amd-iommu/x2apic: sleeping function called from invalid context

2017-07-25 Thread Artem Savkov
Hi,

Commit 1c3c5ea "sched/core: Enable might_sleep() and smp_processor_id()
checks early" seem to have uncovered an issue with amd-iommu/x2apic.

Starting with that commit the following warning started to show up on AMD
systems during boot:

[0.140480] smpboot: Max logical packages: 6 
[0.16] BUG: sleeping function called from invalid context at 
kernel/locking/mutex.c:747 
[0.16] in_atomic(): 0, irqs_disabled(): 1, pid: 1, name: swapper/0 
[0.16] no locks held by swapper/0/1. 
[0.16] irq event stamp: 304 
[0.16] hardirqs last  enabled at (303): [] 
_raw_spin_unlock_irqrestore+0x36/0x60 
[0.16] hardirqs last disabled at (304): [] 
enable_IR_x2apic+0x79/0x196 
[0.16] softirqs last  enabled at (36): [] 
__do_softirq+0x35f/0x4ec 
[0.16] softirqs last disabled at (31): [] 
irq_exit+0x105/0x120 
[0.16] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
4.13.0-rc2.1.el7a.test.x86_64.debug #1 
[0.16] Hardware name:  PowerEdge C6145 /040N24, BIOS 3.5.0 
10/28/2014 
[0.16] Call Trace: 
[0.16]  dump_stack+0x85/0xca 
[0.16]  ___might_sleep+0x22a/0x260 
[0.16]  __might_sleep+0x4a/0x80 
[0.16]  __mutex_lock+0x58/0x960 
[0.16]  ? iommu_completion_wait.part.17+0xb5/0x160 
[0.16]  ? register_syscore_ops+0x1d/0x70 
[0.16]  ? iommu_flush_all_caches+0x120/0x150 
[0.16]  mutex_lock_nested+0x1b/0x20 
[0.16]  register_syscore_ops+0x1d/0x70 
[0.16]  state_next+0x119/0x910 
[0.16]  iommu_go_to_state+0x29/0x30 
[0.16]  amd_iommu_enable+0x13/0x23 
[0.16]  irq_remapping_enable+0x1b/0x39 
[0.16]  enable_IR_x2apic+0x91/0x196 
[0.16]  default_setup_apic_routing+0x16/0x6e 
[0.16]  native_smp_prepare_cpus+0x257/0x2d5 
[0.16]  kernel_init_freeable+0x131/0x2a7 
[0.16]  ? kernel_init+0xe/0x104 
[0.16]  ? _raw_spin_unlock_irq+0x2c/0x40 
[0.16]  ? rest_init+0xe0/0xe0 
[0.16]  kernel_init+0xe/0x104 
[0.16]  ret_from_fork+0x2a/0x40 
[0.160010] Switched APIC routing to physical flat. 

-- 
Regards,
  Artem


[PATCH v2] Use ctlr directly in rdac_failover_get()

2017-05-20 Thread Artem Savkov
rdac_failover_get references struct rdac_controller as
ctlr->ms_sdev->handler_data->ctlr for no apparent reason. Besides being
inefficient this also introduces a null-pointer dereference as
send_mode_select() sets ctlr->ms_sdev to NULL before calling
rdac_failover_get():

[   18.432550] device-mapper: multipath service-time: version 0.3.0 loaded
[   18.436124] BUG: unable to handle kernel NULL pointer dereference at 
0790
[   18.436129] IP: send_mode_select+0xca/0x560
[   18.436129] PGD 0
[   18.436130] P4D 0
[   18.436130]
[   18.436132] Oops:  [#1] SMP
[   18.436133] Modules linked in: dm_service_time sd_mod dm_multipath amdkfd 
amd_iommu_v2 radeon(+) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect 
sysimgblt fb_sys_fops ttm qla2xxx drm serio_raw scsi_transport_fc bnx2 i2c_core 
dm_mirror dm_region_hash dm_log dm_mod
[   18.436143] CPU: 4 PID: 443 Comm: kworker/u16:2 Not tainted 
4.12.0-rc1.1.el7.test.x86_64 #1
[   18.436144] Hardware name: IBM BladeCenter LS22 -[79013SG]-/Server Blade, 
BIOS -[L8E164AUS-1.07]- 05/25/2011
[   18.436145] Workqueue: kmpath_rdacd send_mode_select
[   18.436146] task: 880225116a40 task.stack: c90002bd8000
[   18.436148] RIP: 0010:send_mode_select+0xca/0x560
[   18.436148] RSP: 0018:c90002bdbda8 EFLAGS: 00010246
[   18.436149] RAX:  RBX: c90002bdbe08 RCX: 88017ef04a80
[   18.436150] RDX: c90002bdbe08 RSI: 88017ef04a80 RDI: 8802248e4388
[   18.436151] RBP: c90002bdbe48 R08:  R09: 81c104c0
[   18.436151] R10: 01ff R11: 035a R12: c90002bdbdd8
[   18.436152] R13: 8802248e4390 R14: 880225152800 R15: 8802248e4400
[   18.436153] FS:  () GS:880227d0() 
knlGS:
[   18.436154] CS:  0010 DS:  ES:  CR0: 80050033
[   18.436154] CR2: 0790 CR3: 00042535b000 CR4: 06e0
[   18.436155] Call Trace:
[   18.436159]  ? rdac_activate+0x14e/0x150
[   18.436161]  ? refcount_dec_and_test+0x11/0x20
[   18.436162]  ? kobject_put+0x1c/0x50
[   18.436165]  ? scsi_dh_activate+0x6f/0xd0
[   18.436168]  process_one_work+0x149/0x360
[   18.436170]  worker_thread+0x4d/0x3c0
[   18.436172]  kthread+0x109/0x140
[   18.436173]  ? rescuer_thread+0x380/0x380
[   18.436174]  ? kthread_park+0x60/0x60
[   18.436176]  ret_from_fork+0x2c/0x40
[   18.436177] Code: 49 c7 46 20 00 00 00 00 4c 89 ef c6 07 00 0f 1f 40 00 45 
31 ed c7 45 b0 05 00 00 00 44 89 6d b4 4d 89 f5 4c 8b 75 a8 49 8b 45 20 <48> 8b 
b0 90 07 00 00 48 8b 56 10 8b 42 10 48 8d 7a 28 85 c0 0f
[   18.436192] RIP: send_mode_select+0xca/0x560 RSP: c90002bdbda8
[   18.436192] CR2: 0790
[   18.436198] ---[ end trace 40f3e4dca1ffabdd ]---
[   18.436199] Kernel panic - not syncing: Fatal exception
[   18.436222] Kernel Offset: disabled
[-- MARK -- Thu May 18 11:45:00 2017]

Fixes: 3278255 scsi_dh_rdac: switch to scsi_execute_req_flags()
Cc: sta...@vger.kernel.org
Signed-off-by: Artem Savkov <asav...@redhat.com>
---
 drivers/scsi/device_handler/scsi_dh_rdac.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_rdac.c 
b/drivers/scsi/device_handler/scsi_dh_rdac.c
index 3cbab87..2ceff58 100644
--- a/drivers/scsi/device_handler/scsi_dh_rdac.c
+++ b/drivers/scsi/device_handler/scsi_dh_rdac.c
@@ -265,18 +265,16 @@ static unsigned int rdac_failover_get(struct 
rdac_controller *ctlr,
  struct list_head *list,
  unsigned char *cdb)
 {
-   struct scsi_device *sdev = ctlr->ms_sdev;
-   struct rdac_dh_data *h = sdev->handler_data;
struct rdac_mode_common *common;
unsigned data_size;
struct rdac_queue_data *qdata;
u8 *lun_table;
 
-   if (h->ctlr->use_ms10) {
+   if (ctlr->use_ms10) {
struct rdac_pg_expanded *rdac_pg;
 
data_size = sizeof(struct rdac_pg_expanded);
-   rdac_pg = >ctlr->mode_select.expanded;
+   rdac_pg = >mode_select.expanded;
memset(rdac_pg, 0, data_size);
common = _pg->common;
rdac_pg->page_code = RDAC_PAGE_CODE_REDUNDANT_CONTROLLER + 0x40;
@@ -288,7 +286,7 @@ static unsigned int rdac_failover_get(struct 
rdac_controller *ctlr,
struct rdac_pg_legacy *rdac_pg;
 
data_size = sizeof(struct rdac_pg_legacy);
-   rdac_pg = >ctlr->mode_select.legacy;
+   rdac_pg = >mode_select.legacy;
memset(rdac_pg, 0, data_size);
common = _pg->common;
rdac_pg->page_code = RDAC_PAGE_CODE_REDUNDANT_CONTROLLER;
@@ -304,7 +302,7 @@ static unsigned int rdac_failover_get(struct 
rdac_controller *ctlr,
}
 
/* Prepare the command. */
-   if (h->ctlr->use_ms

[PATCH v2] Use ctlr directly in rdac_failover_get()

2017-05-20 Thread Artem Savkov
rdac_failover_get references struct rdac_controller as
ctlr->ms_sdev->handler_data->ctlr for no apparent reason. Besides being
inefficient this also introduces a null-pointer dereference as
send_mode_select() sets ctlr->ms_sdev to NULL before calling
rdac_failover_get():

[   18.432550] device-mapper: multipath service-time: version 0.3.0 loaded
[   18.436124] BUG: unable to handle kernel NULL pointer dereference at 
0790
[   18.436129] IP: send_mode_select+0xca/0x560
[   18.436129] PGD 0
[   18.436130] P4D 0
[   18.436130]
[   18.436132] Oops:  [#1] SMP
[   18.436133] Modules linked in: dm_service_time sd_mod dm_multipath amdkfd 
amd_iommu_v2 radeon(+) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect 
sysimgblt fb_sys_fops ttm qla2xxx drm serio_raw scsi_transport_fc bnx2 i2c_core 
dm_mirror dm_region_hash dm_log dm_mod
[   18.436143] CPU: 4 PID: 443 Comm: kworker/u16:2 Not tainted 
4.12.0-rc1.1.el7.test.x86_64 #1
[   18.436144] Hardware name: IBM BladeCenter LS22 -[79013SG]-/Server Blade, 
BIOS -[L8E164AUS-1.07]- 05/25/2011
[   18.436145] Workqueue: kmpath_rdacd send_mode_select
[   18.436146] task: 880225116a40 task.stack: c90002bd8000
[   18.436148] RIP: 0010:send_mode_select+0xca/0x560
[   18.436148] RSP: 0018:c90002bdbda8 EFLAGS: 00010246
[   18.436149] RAX:  RBX: c90002bdbe08 RCX: 88017ef04a80
[   18.436150] RDX: c90002bdbe08 RSI: 88017ef04a80 RDI: 8802248e4388
[   18.436151] RBP: c90002bdbe48 R08:  R09: 81c104c0
[   18.436151] R10: 01ff R11: 035a R12: c90002bdbdd8
[   18.436152] R13: 8802248e4390 R14: 880225152800 R15: 8802248e4400
[   18.436153] FS:  () GS:880227d0() 
knlGS:
[   18.436154] CS:  0010 DS:  ES:  CR0: 80050033
[   18.436154] CR2: 0790 CR3: 00042535b000 CR4: 06e0
[   18.436155] Call Trace:
[   18.436159]  ? rdac_activate+0x14e/0x150
[   18.436161]  ? refcount_dec_and_test+0x11/0x20
[   18.436162]  ? kobject_put+0x1c/0x50
[   18.436165]  ? scsi_dh_activate+0x6f/0xd0
[   18.436168]  process_one_work+0x149/0x360
[   18.436170]  worker_thread+0x4d/0x3c0
[   18.436172]  kthread+0x109/0x140
[   18.436173]  ? rescuer_thread+0x380/0x380
[   18.436174]  ? kthread_park+0x60/0x60
[   18.436176]  ret_from_fork+0x2c/0x40
[   18.436177] Code: 49 c7 46 20 00 00 00 00 4c 89 ef c6 07 00 0f 1f 40 00 45 
31 ed c7 45 b0 05 00 00 00 44 89 6d b4 4d 89 f5 4c 8b 75 a8 49 8b 45 20 <48> 8b 
b0 90 07 00 00 48 8b 56 10 8b 42 10 48 8d 7a 28 85 c0 0f
[   18.436192] RIP: send_mode_select+0xca/0x560 RSP: c90002bdbda8
[   18.436192] CR2: 0790
[   18.436198] ---[ end trace 40f3e4dca1ffabdd ]---
[   18.436199] Kernel panic - not syncing: Fatal exception
[   18.436222] Kernel Offset: disabled
[-- MARK -- Thu May 18 11:45:00 2017]

Fixes: 3278255 scsi_dh_rdac: switch to scsi_execute_req_flags()
Cc: sta...@vger.kernel.org
Signed-off-by: Artem Savkov 
---
 drivers/scsi/device_handler/scsi_dh_rdac.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_rdac.c 
b/drivers/scsi/device_handler/scsi_dh_rdac.c
index 3cbab87..2ceff58 100644
--- a/drivers/scsi/device_handler/scsi_dh_rdac.c
+++ b/drivers/scsi/device_handler/scsi_dh_rdac.c
@@ -265,18 +265,16 @@ static unsigned int rdac_failover_get(struct 
rdac_controller *ctlr,
  struct list_head *list,
  unsigned char *cdb)
 {
-   struct scsi_device *sdev = ctlr->ms_sdev;
-   struct rdac_dh_data *h = sdev->handler_data;
struct rdac_mode_common *common;
unsigned data_size;
struct rdac_queue_data *qdata;
u8 *lun_table;
 
-   if (h->ctlr->use_ms10) {
+   if (ctlr->use_ms10) {
struct rdac_pg_expanded *rdac_pg;
 
data_size = sizeof(struct rdac_pg_expanded);
-   rdac_pg = >ctlr->mode_select.expanded;
+   rdac_pg = >mode_select.expanded;
memset(rdac_pg, 0, data_size);
common = _pg->common;
rdac_pg->page_code = RDAC_PAGE_CODE_REDUNDANT_CONTROLLER + 0x40;
@@ -288,7 +286,7 @@ static unsigned int rdac_failover_get(struct 
rdac_controller *ctlr,
struct rdac_pg_legacy *rdac_pg;
 
data_size = sizeof(struct rdac_pg_legacy);
-   rdac_pg = >ctlr->mode_select.legacy;
+   rdac_pg = >mode_select.legacy;
memset(rdac_pg, 0, data_size);
common = _pg->common;
rdac_pg->page_code = RDAC_PAGE_CODE_REDUNDANT_CONTROLLER;
@@ -304,7 +302,7 @@ static unsigned int rdac_failover_get(struct 
rdac_controller *ctlr,
}
 
/* Prepare the command. */
-   if (h->ctlr->use_ms10) {
+   if (ctlr->use_ms10) {
   

[PATCH] Use ctlr directly in rdac_failover_get()

2017-05-19 Thread Artem Savkov
rdac_failover_get references struct rdac_controller as
ctlr->ms_sdev->handler_data->ctlr for no apparent reason. Besides being
inefficient this also introduces a null-pointer dereference as
send_mode_select() sets ctlr->ms_sdev to NULL before calling
rdac_failover_get():

[   18.432550] device-mapper: multipath service-time: version 0.3.0 loaded
[   18.436124] BUG: unable to handle kernel NULL pointer dereference at 
0790
[   18.436129] IP: send_mode_select+0xca/0x560
[   18.436129] PGD 0
[   18.436130] P4D 0
[   18.436130]
[   18.436132] Oops:  [#1] SMP
[   18.436133] Modules linked in: dm_service_time sd_mod dm_multipath amdkfd 
amd_iommu_v2 radeon(+) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect 
sysimgblt fb_sys_fops ttm qla2xxx drm serio_raw scsi_transport_fc bnx2 i2c_core 
dm_mirror dm_region_hash dm_log dm_mod
[   18.436143] CPU: 4 PID: 443 Comm: kworker/u16:2 Not tainted 
4.12.0-rc1.1.el7.test.x86_64 #1
[   18.436144] Hardware name: IBM BladeCenter LS22 -[79013SG]-/Server Blade, 
BIOS -[L8E164AUS-1.07]- 05/25/2011
[   18.436145] Workqueue: kmpath_rdacd send_mode_select
[   18.436146] task: 880225116a40 task.stack: c90002bd8000
[   18.436148] RIP: 0010:send_mode_select+0xca/0x560
[   18.436148] RSP: 0018:c90002bdbda8 EFLAGS: 00010246
[   18.436149] RAX:  RBX: c90002bdbe08 RCX: 88017ef04a80
[   18.436150] RDX: c90002bdbe08 RSI: 88017ef04a80 RDI: 8802248e4388
[   18.436151] RBP: c90002bdbe48 R08:  R09: 81c104c0
[   18.436151] R10: 01ff R11: 035a R12: c90002bdbdd8
[   18.436152] R13: 8802248e4390 R14: 880225152800 R15: 8802248e4400
[   18.436153] FS:  () GS:880227d0() 
knlGS:
[   18.436154] CS:  0010 DS:  ES:  CR0: 80050033
[   18.436154] CR2: 0790 CR3: 00042535b000 CR4: 06e0
[   18.436155] Call Trace:
[   18.436159]  ? rdac_activate+0x14e/0x150
[   18.436161]  ? refcount_dec_and_test+0x11/0x20
[   18.436162]  ? kobject_put+0x1c/0x50
[   18.436165]  ? scsi_dh_activate+0x6f/0xd0
[   18.436168]  process_one_work+0x149/0x360
[   18.436170]  worker_thread+0x4d/0x3c0
[   18.436172]  kthread+0x109/0x140
[   18.436173]  ? rescuer_thread+0x380/0x380
[   18.436174]  ? kthread_park+0x60/0x60
[   18.436176]  ret_from_fork+0x2c/0x40
[   18.436177] Code: 49 c7 46 20 00 00 00 00 4c 89 ef c6 07 00 0f 1f 40 00 45 
31 ed c7 45 b0 05 00 00 00 44 89 6d b4 4d 89 f5 4c 8b 75 a8 49 8b 45 20 <48> 8b 
b0 90 07 00 00 48 8b 56 10 8b 42 10 48 8d 7a 28 85 c0 0f
[   18.436192] RIP: send_mode_select+0xca/0x560 RSP: c90002bdbda8
[   18.436192] CR2: 0790
[   18.436198] ---[ end trace 40f3e4dca1ffabdd ]---
[   18.436199] Kernel panic - not syncing: Fatal exception
[   18.436222] Kernel Offset: disabled
[-- MARK -- Thu May 18 11:45:00 2017]

Signed-off-by: Artem Savkov <asav...@redhat.com>
---
 drivers/scsi/device_handler/scsi_dh_rdac.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_rdac.c 
b/drivers/scsi/device_handler/scsi_dh_rdac.c
index 3cbab87..2ceff58 100644
--- a/drivers/scsi/device_handler/scsi_dh_rdac.c
+++ b/drivers/scsi/device_handler/scsi_dh_rdac.c
@@ -265,18 +265,16 @@ static unsigned int rdac_failover_get(struct 
rdac_controller *ctlr,
  struct list_head *list,
  unsigned char *cdb)
 {
-   struct scsi_device *sdev = ctlr->ms_sdev;
-   struct rdac_dh_data *h = sdev->handler_data;
struct rdac_mode_common *common;
unsigned data_size;
struct rdac_queue_data *qdata;
u8 *lun_table;
 
-   if (h->ctlr->use_ms10) {
+   if (ctlr->use_ms10) {
struct rdac_pg_expanded *rdac_pg;
 
data_size = sizeof(struct rdac_pg_expanded);
-   rdac_pg = >ctlr->mode_select.expanded;
+   rdac_pg = >mode_select.expanded;
memset(rdac_pg, 0, data_size);
common = _pg->common;
rdac_pg->page_code = RDAC_PAGE_CODE_REDUNDANT_CONTROLLER + 0x40;
@@ -288,7 +286,7 @@ static unsigned int rdac_failover_get(struct 
rdac_controller *ctlr,
struct rdac_pg_legacy *rdac_pg;
 
data_size = sizeof(struct rdac_pg_legacy);
-   rdac_pg = >ctlr->mode_select.legacy;
+   rdac_pg = >mode_select.legacy;
memset(rdac_pg, 0, data_size);
common = _pg->common;
rdac_pg->page_code = RDAC_PAGE_CODE_REDUNDANT_CONTROLLER;
@@ -304,7 +302,7 @@ static unsigned int rdac_failover_get(struct 
rdac_controller *ctlr,
}
 
/* Prepare the command. */
-   if (h->ctlr->use_ms10) {
+   if (ctlr->use_ms10) {
cdb[0] = MODE_SELECT_10;
cdb[7] = data_size >> 8;
cdb[8] = data_size & 0xff;
-- 
1.8.3.1



[PATCH] Use ctlr directly in rdac_failover_get()

2017-05-19 Thread Artem Savkov
rdac_failover_get references struct rdac_controller as
ctlr->ms_sdev->handler_data->ctlr for no apparent reason. Besides being
inefficient this also introduces a null-pointer dereference as
send_mode_select() sets ctlr->ms_sdev to NULL before calling
rdac_failover_get():

[   18.432550] device-mapper: multipath service-time: version 0.3.0 loaded
[   18.436124] BUG: unable to handle kernel NULL pointer dereference at 
0790
[   18.436129] IP: send_mode_select+0xca/0x560
[   18.436129] PGD 0
[   18.436130] P4D 0
[   18.436130]
[   18.436132] Oops:  [#1] SMP
[   18.436133] Modules linked in: dm_service_time sd_mod dm_multipath amdkfd 
amd_iommu_v2 radeon(+) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect 
sysimgblt fb_sys_fops ttm qla2xxx drm serio_raw scsi_transport_fc bnx2 i2c_core 
dm_mirror dm_region_hash dm_log dm_mod
[   18.436143] CPU: 4 PID: 443 Comm: kworker/u16:2 Not tainted 
4.12.0-rc1.1.el7.test.x86_64 #1
[   18.436144] Hardware name: IBM BladeCenter LS22 -[79013SG]-/Server Blade, 
BIOS -[L8E164AUS-1.07]- 05/25/2011
[   18.436145] Workqueue: kmpath_rdacd send_mode_select
[   18.436146] task: 880225116a40 task.stack: c90002bd8000
[   18.436148] RIP: 0010:send_mode_select+0xca/0x560
[   18.436148] RSP: 0018:c90002bdbda8 EFLAGS: 00010246
[   18.436149] RAX:  RBX: c90002bdbe08 RCX: 88017ef04a80
[   18.436150] RDX: c90002bdbe08 RSI: 88017ef04a80 RDI: 8802248e4388
[   18.436151] RBP: c90002bdbe48 R08:  R09: 81c104c0
[   18.436151] R10: 01ff R11: 035a R12: c90002bdbdd8
[   18.436152] R13: 8802248e4390 R14: 880225152800 R15: 8802248e4400
[   18.436153] FS:  () GS:880227d0() 
knlGS:
[   18.436154] CS:  0010 DS:  ES:  CR0: 80050033
[   18.436154] CR2: 0790 CR3: 00042535b000 CR4: 06e0
[   18.436155] Call Trace:
[   18.436159]  ? rdac_activate+0x14e/0x150
[   18.436161]  ? refcount_dec_and_test+0x11/0x20
[   18.436162]  ? kobject_put+0x1c/0x50
[   18.436165]  ? scsi_dh_activate+0x6f/0xd0
[   18.436168]  process_one_work+0x149/0x360
[   18.436170]  worker_thread+0x4d/0x3c0
[   18.436172]  kthread+0x109/0x140
[   18.436173]  ? rescuer_thread+0x380/0x380
[   18.436174]  ? kthread_park+0x60/0x60
[   18.436176]  ret_from_fork+0x2c/0x40
[   18.436177] Code: 49 c7 46 20 00 00 00 00 4c 89 ef c6 07 00 0f 1f 40 00 45 
31 ed c7 45 b0 05 00 00 00 44 89 6d b4 4d 89 f5 4c 8b 75 a8 49 8b 45 20 <48> 8b 
b0 90 07 00 00 48 8b 56 10 8b 42 10 48 8d 7a 28 85 c0 0f
[   18.436192] RIP: send_mode_select+0xca/0x560 RSP: c90002bdbda8
[   18.436192] CR2: 0790
[   18.436198] ---[ end trace 40f3e4dca1ffabdd ]---
[   18.436199] Kernel panic - not syncing: Fatal exception
[   18.436222] Kernel Offset: disabled
[-- MARK -- Thu May 18 11:45:00 2017]

Signed-off-by: Artem Savkov 
---
 drivers/scsi/device_handler/scsi_dh_rdac.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_rdac.c 
b/drivers/scsi/device_handler/scsi_dh_rdac.c
index 3cbab87..2ceff58 100644
--- a/drivers/scsi/device_handler/scsi_dh_rdac.c
+++ b/drivers/scsi/device_handler/scsi_dh_rdac.c
@@ -265,18 +265,16 @@ static unsigned int rdac_failover_get(struct 
rdac_controller *ctlr,
  struct list_head *list,
  unsigned char *cdb)
 {
-   struct scsi_device *sdev = ctlr->ms_sdev;
-   struct rdac_dh_data *h = sdev->handler_data;
struct rdac_mode_common *common;
unsigned data_size;
struct rdac_queue_data *qdata;
u8 *lun_table;
 
-   if (h->ctlr->use_ms10) {
+   if (ctlr->use_ms10) {
struct rdac_pg_expanded *rdac_pg;
 
data_size = sizeof(struct rdac_pg_expanded);
-   rdac_pg = >ctlr->mode_select.expanded;
+   rdac_pg = >mode_select.expanded;
memset(rdac_pg, 0, data_size);
common = _pg->common;
rdac_pg->page_code = RDAC_PAGE_CODE_REDUNDANT_CONTROLLER + 0x40;
@@ -288,7 +286,7 @@ static unsigned int rdac_failover_get(struct 
rdac_controller *ctlr,
struct rdac_pg_legacy *rdac_pg;
 
data_size = sizeof(struct rdac_pg_legacy);
-   rdac_pg = >ctlr->mode_select.legacy;
+   rdac_pg = >mode_select.legacy;
memset(rdac_pg, 0, data_size);
common = _pg->common;
rdac_pg->page_code = RDAC_PAGE_CODE_REDUNDANT_CONTROLLER;
@@ -304,7 +302,7 @@ static unsigned int rdac_failover_get(struct 
rdac_controller *ctlr,
}
 
/* Prepare the command. */
-   if (h->ctlr->use_ms10) {
+   if (ctlr->use_ms10) {
cdb[0] = MODE_SELECT_10;
cdb[7] = data_size >> 8;
cdb[8] = data_size & 0xff;
-- 
1.8.3.1



[PATCH v2] nfs/filelayout: fix NULL pointer dereference in fl_pnfs_update_layout()

2017-04-21 Thread Artem Savkov
Calling pnfs_put_lset on an IS_ERR pointer results in a NULL pointer
dereference like the one below. At the same time the check of retvalue
of filelayout_check_deviceid() sets lseg to error, but does not free it
before that.

[ 3000.636161] BUG: unable to handle kernel NULL pointer dereference at 
003c
[ 3000.636970] IP: pnfs_put_lseg+0x29/0x100 [nfsv4]
[ 3000.637420] PGD 4f23b067
[ 3000.637421] PUD 4a0f4067
[ 3000.637679] PMD 0
[ 3000.637937]
[ 3000.638287] Oops:  [#1] SMP
[ 3000.638591] Modules linked in: nfs_layout_nfsv41_files nfsv3 nfnetlink_queue 
nfnetlink_log nfnetlink bluetooth rfkill rpcsec_gss_krb5 nfsv4 nfs fscache 
binfmt_misc arc4 md4 nls_utf8 cifs ccm dns_resolver rpcrdma ib_isert 
iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt 
target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad 
ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 
nf_conntrack crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr 
virtio_balloon ppdev virtio_rng parport_pc i2c_piix4 parport acpi_cpufreq nfsd 
auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c ata_generic pata_acpi 
virtio_blk virtio_net cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt 
fb_sys_fops crc32c_intel ata_piix ttm libata drm serio_raw
[ 3000.645245]  i2c_core virtio_pci virtio_ring virtio floppy dm_mirror 
dm_region_hash dm_log dm_mod [last unloaded: xt_u32]
[ 3000.646360] CPU: 1 PID: 26402 Comm: date Not tainted 
4.11.0-rc7.1.el7.test.x86_64 #1
[ 3000.647092] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 3000.647638] task: 8800415ada00 task.stack: c9ff
[ 3000.648207] RIP: 0010:pnfs_put_lseg+0x29/0x100 [nfsv4]
[ 3000.648696] RSP: 0018:c9ff39b8 EFLAGS: 00010246
[ 3000.649193] RAX:  RBX: fff4 RCX: 000d43be
[ 3000.649859] RDX: 000d43bd RSI:  RDI: fff4
[ 3000.650530] RBP: c9ff39d8 R08: 0001e320 R09: a05c35ce
[ 3000.651203] R10: 88007fd1e320 R11: ea0001283d80 R12: 01400040
[ 3000.651875] R13: 88004f77d9f0 R14: c9ff3cd8 R15: 8800417ade00
[ 3000.652546] FS:  7fac4d5cd740() GS:88007fd0() 
knlGS:
[ 3000.653304] CS:  0010 DS:  ES:  CR0: 80050033
[ 3000.653849] CR2: 003c CR3: 4f08 CR4: 000406e0
[ 3000.654527] Call Trace:
[ 3000.654771]  fl_pnfs_update_layout.constprop.20+0x10c/0x150 
[nfs_layout_nfsv41_files]
[ 3000.655505]  filelayout_pg_init_write+0x21d/0x270 [nfs_layout_nfsv41_files]
[ 3000.656195]  __nfs_pageio_add_request+0x11c/0x490 [nfs]
[ 3000.656698]  nfs_pageio_add_request+0xac/0x260 [nfs]
[ 3000.657180]  nfs_do_writepage+0x109/0x2e0 [nfs]
[ 3000.657616]  nfs_writepages_callback+0x16/0x30 [nfs]
[ 3000.658096]  write_cache_pages+0x26f/0x510
[ 3000.658495]  ? nfs_do_writepage+0x2e0/0x2e0 [nfs]
[ 3000.658946]  ? _raw_spin_unlock_bh+0x1e/0x20
[ 3000.659357]  ? wb_wakeup_delayed+0x5f/0x70
[ 3000.659748]  ? __mark_inode_dirty+0x2eb/0x360
[ 3000.660170]  nfs_writepages+0x84/0xd0 [nfs]
[ 3000.660575]  ? nfs_updatepage+0x571/0xb70 [nfs]
[ 3000.661012]  do_writepages+0x1e/0x30
[ 3000.661358]  __filemap_fdatawrite_range+0xc6/0x100
[ 3000.661819]  filemap_write_and_wait_range+0x41/0x90
[ 3000.662292]  nfs_file_fsync+0x34/0x1f0 [nfs]
[ 3000.662704]  vfs_fsync_range+0x3d/0xb0
[ 3000.663065]  vfs_fsync+0x1c/0x20
[ 3000.663385]  nfs4_file_flush+0x57/0x80 [nfsv4]
[ 3000.663813]  filp_close+0x2f/0x70
[ 3000.664132]  __close_fd+0x9a/0xc0
[ 3000.664453]  SyS_close+0x23/0x50
[ 3000.664785]  do_syscall_64+0x67/0x180
[ 3000.665162]  entry_SYSCALL64_slow_path+0x25/0x25
[ 3000.665600] RIP: 0033:0x7fac4d0e1e90
[ 3000.665946] RSP: 002b:7ffd54e90c88 EFLAGS: 0246 ORIG_RAX: 
0003
[ 3000.79] RAX: ffda RBX: 7fac4d3b5400 RCX: 7fac4d0e1e90
[ 3000.667349] RDX:  RSI: 7fac4d5d9000 RDI: 0001
[ 3000.668031] RBP:  R08: 7fac4d3b6a00 R09: 7fac4d5cd740
[ 3000.668709] R10: 7ffd54e909e0 R11: 0246 R12: 
[ 3000.669385] R13: 7fac4d3b5e80 R14:  R15: 
[ 3000.670061] Code: 00 00 66 66 66 66 90 55 48 85 ff 48 89 e5 41 56 41 55 41 
54 53 48 89 fb 0f 84 97 00 00 00 f6 05 16 8f bc ff 10 0f 85 a6 00 00 00 <4c> 8b 
63 48 48 8d 7b 38 49 8b 84 24 90 00 00 00 4c 8d a8 88 00
[ 3000.671831] RIP: pnfs_put_lseg+0x29/0x100 [nfsv4] RSP: c9ff39b8
[ 3000.672462] CR2: 003c

Signed-off-by: Artem Savkov <asav...@redhat.com>
---
 fs/nfs/filelayout/filelayout.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index acd30ba..fb39fd8 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -921,11 +921,11 @@ fl_pnfs_update_layout(struct inode *ino,
fl = FILELAYO

[PATCH v2] nfs/filelayout: fix NULL pointer dereference in fl_pnfs_update_layout()

2017-04-21 Thread Artem Savkov
Calling pnfs_put_lset on an IS_ERR pointer results in a NULL pointer
dereference like the one below. At the same time the check of retvalue
of filelayout_check_deviceid() sets lseg to error, but does not free it
before that.

[ 3000.636161] BUG: unable to handle kernel NULL pointer dereference at 
003c
[ 3000.636970] IP: pnfs_put_lseg+0x29/0x100 [nfsv4]
[ 3000.637420] PGD 4f23b067
[ 3000.637421] PUD 4a0f4067
[ 3000.637679] PMD 0
[ 3000.637937]
[ 3000.638287] Oops:  [#1] SMP
[ 3000.638591] Modules linked in: nfs_layout_nfsv41_files nfsv3 nfnetlink_queue 
nfnetlink_log nfnetlink bluetooth rfkill rpcsec_gss_krb5 nfsv4 nfs fscache 
binfmt_misc arc4 md4 nls_utf8 cifs ccm dns_resolver rpcrdma ib_isert 
iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt 
target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad 
ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 
nf_conntrack crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr 
virtio_balloon ppdev virtio_rng parport_pc i2c_piix4 parport acpi_cpufreq nfsd 
auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c ata_generic pata_acpi 
virtio_blk virtio_net cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt 
fb_sys_fops crc32c_intel ata_piix ttm libata drm serio_raw
[ 3000.645245]  i2c_core virtio_pci virtio_ring virtio floppy dm_mirror 
dm_region_hash dm_log dm_mod [last unloaded: xt_u32]
[ 3000.646360] CPU: 1 PID: 26402 Comm: date Not tainted 
4.11.0-rc7.1.el7.test.x86_64 #1
[ 3000.647092] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 3000.647638] task: 8800415ada00 task.stack: c9ff
[ 3000.648207] RIP: 0010:pnfs_put_lseg+0x29/0x100 [nfsv4]
[ 3000.648696] RSP: 0018:c9ff39b8 EFLAGS: 00010246
[ 3000.649193] RAX:  RBX: fff4 RCX: 000d43be
[ 3000.649859] RDX: 000d43bd RSI:  RDI: fff4
[ 3000.650530] RBP: c9ff39d8 R08: 0001e320 R09: a05c35ce
[ 3000.651203] R10: 88007fd1e320 R11: ea0001283d80 R12: 01400040
[ 3000.651875] R13: 88004f77d9f0 R14: c9ff3cd8 R15: 8800417ade00
[ 3000.652546] FS:  7fac4d5cd740() GS:88007fd0() 
knlGS:
[ 3000.653304] CS:  0010 DS:  ES:  CR0: 80050033
[ 3000.653849] CR2: 003c CR3: 4f08 CR4: 000406e0
[ 3000.654527] Call Trace:
[ 3000.654771]  fl_pnfs_update_layout.constprop.20+0x10c/0x150 
[nfs_layout_nfsv41_files]
[ 3000.655505]  filelayout_pg_init_write+0x21d/0x270 [nfs_layout_nfsv41_files]
[ 3000.656195]  __nfs_pageio_add_request+0x11c/0x490 [nfs]
[ 3000.656698]  nfs_pageio_add_request+0xac/0x260 [nfs]
[ 3000.657180]  nfs_do_writepage+0x109/0x2e0 [nfs]
[ 3000.657616]  nfs_writepages_callback+0x16/0x30 [nfs]
[ 3000.658096]  write_cache_pages+0x26f/0x510
[ 3000.658495]  ? nfs_do_writepage+0x2e0/0x2e0 [nfs]
[ 3000.658946]  ? _raw_spin_unlock_bh+0x1e/0x20
[ 3000.659357]  ? wb_wakeup_delayed+0x5f/0x70
[ 3000.659748]  ? __mark_inode_dirty+0x2eb/0x360
[ 3000.660170]  nfs_writepages+0x84/0xd0 [nfs]
[ 3000.660575]  ? nfs_updatepage+0x571/0xb70 [nfs]
[ 3000.661012]  do_writepages+0x1e/0x30
[ 3000.661358]  __filemap_fdatawrite_range+0xc6/0x100
[ 3000.661819]  filemap_write_and_wait_range+0x41/0x90
[ 3000.662292]  nfs_file_fsync+0x34/0x1f0 [nfs]
[ 3000.662704]  vfs_fsync_range+0x3d/0xb0
[ 3000.663065]  vfs_fsync+0x1c/0x20
[ 3000.663385]  nfs4_file_flush+0x57/0x80 [nfsv4]
[ 3000.663813]  filp_close+0x2f/0x70
[ 3000.664132]  __close_fd+0x9a/0xc0
[ 3000.664453]  SyS_close+0x23/0x50
[ 3000.664785]  do_syscall_64+0x67/0x180
[ 3000.665162]  entry_SYSCALL64_slow_path+0x25/0x25
[ 3000.665600] RIP: 0033:0x7fac4d0e1e90
[ 3000.665946] RSP: 002b:7ffd54e90c88 EFLAGS: 0246 ORIG_RAX: 
0003
[ 3000.79] RAX: ffda RBX: 7fac4d3b5400 RCX: 7fac4d0e1e90
[ 3000.667349] RDX:  RSI: 7fac4d5d9000 RDI: 0001
[ 3000.668031] RBP:  R08: 7fac4d3b6a00 R09: 7fac4d5cd740
[ 3000.668709] R10: 7ffd54e909e0 R11: 0246 R12: 
[ 3000.669385] R13: 7fac4d3b5e80 R14:  R15: 
[ 3000.670061] Code: 00 00 66 66 66 66 90 55 48 85 ff 48 89 e5 41 56 41 55 41 
54 53 48 89 fb 0f 84 97 00 00 00 f6 05 16 8f bc ff 10 0f 85 a6 00 00 00 <4c> 8b 
63 48 48 8d 7b 38 49 8b 84 24 90 00 00 00 4c 8d a8 88 00
[ 3000.671831] RIP: pnfs_put_lseg+0x29/0x100 [nfsv4] RSP: c9ff39b8
[ 3000.672462] CR2: 003c

Signed-off-by: Artem Savkov 
---
 fs/nfs/filelayout/filelayout.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index acd30ba..fb39fd8 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -921,11 +921,11 @@ fl_pnfs_update_layout(struct inode *ino,
fl = FILELAYOUT_LSEG(lseg);
 

[PATCH] nfs/filelayout: fix NULL pointer dereference in fl_pnfs_update_layout()

2017-04-21 Thread Artem Savkov
Calling pnfs_put_lset on an IS_ERR pointer results in a NULL pointer
dereference like the one below. fl_pnfs_update_layout()'s output is
checked after each call so it doesn't seem that it should try to handle
these errors on it's own.

[ 3000.636161] BUG: unable to handle kernel NULL pointer dereference at 
003c
[ 3000.636970] IP: pnfs_put_lseg+0x29/0x100 [nfsv4]
[ 3000.637420] PGD 4f23b067
[ 3000.637421] PUD 4a0f4067
[ 3000.637679] PMD 0
[ 3000.637937]
[ 3000.638287] Oops:  [#1] SMP
[ 3000.638591] Modules linked in: nfs_layout_nfsv41_files nfsv3 nfnetlink_queue 
nfnetlink_log nfnetlink bluetooth rfkill rpcsec_gss_krb5 nfsv4 nfs fscache 
binfmt_misc arc4 md4 nls_utf8 cifs ccm dns_resolver rpcrdma ib_isert 
iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt 
target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad 
ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 
nf_conntrack crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr 
virtio_balloon ppdev virtio_rng parport_pc i2c_piix4 parport acpi_cpufreq nfsd 
auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c ata_generic pata_acpi 
virtio_blk virtio_net cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt 
fb_sys_fops crc32c_intel ata_piix ttm libata drm serio_raw
[ 3000.645245]  i2c_core virtio_pci virtio_ring virtio floppy dm_mirror 
dm_region_hash dm_log dm_mod [last unloaded: xt_u32]
[ 3000.646360] CPU: 1 PID: 26402 Comm: date Not tainted 
4.11.0-rc7.1.el7.test.x86_64 #1
[ 3000.647092] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 3000.647638] task: 8800415ada00 task.stack: c9ff
[ 3000.648207] RIP: 0010:pnfs_put_lseg+0x29/0x100 [nfsv4]
[ 3000.648696] RSP: 0018:c9ff39b8 EFLAGS: 00010246
[ 3000.649193] RAX:  RBX: fff4 RCX: 000d43be
[ 3000.649859] RDX: 000d43bd RSI:  RDI: fff4
[ 3000.650530] RBP: c9ff39d8 R08: 0001e320 R09: a05c35ce
[ 3000.651203] R10: 88007fd1e320 R11: ea0001283d80 R12: 01400040
[ 3000.651875] R13: 88004f77d9f0 R14: c9ff3cd8 R15: 8800417ade00
[ 3000.652546] FS:  7fac4d5cd740() GS:88007fd0() 
knlGS:
[ 3000.653304] CS:  0010 DS:  ES:  CR0: 80050033
[ 3000.653849] CR2: 003c CR3: 4f08 CR4: 000406e0
[ 3000.654527] Call Trace:
[ 3000.654771]  fl_pnfs_update_layout.constprop.20+0x10c/0x150 
[nfs_layout_nfsv41_files]
[ 3000.655505]  filelayout_pg_init_write+0x21d/0x270 [nfs_layout_nfsv41_files]
[ 3000.656195]  __nfs_pageio_add_request+0x11c/0x490 [nfs]
[ 3000.656698]  nfs_pageio_add_request+0xac/0x260 [nfs]
[ 3000.657180]  nfs_do_writepage+0x109/0x2e0 [nfs]
[ 3000.657616]  nfs_writepages_callback+0x16/0x30 [nfs]
[ 3000.658096]  write_cache_pages+0x26f/0x510
[ 3000.658495]  ? nfs_do_writepage+0x2e0/0x2e0 [nfs]
[ 3000.658946]  ? _raw_spin_unlock_bh+0x1e/0x20
[ 3000.659357]  ? wb_wakeup_delayed+0x5f/0x70
[ 3000.659748]  ? __mark_inode_dirty+0x2eb/0x360
[ 3000.660170]  nfs_writepages+0x84/0xd0 [nfs]
[ 3000.660575]  ? nfs_updatepage+0x571/0xb70 [nfs]
[ 3000.661012]  do_writepages+0x1e/0x30
[ 3000.661358]  __filemap_fdatawrite_range+0xc6/0x100
[ 3000.661819]  filemap_write_and_wait_range+0x41/0x90
[ 3000.662292]  nfs_file_fsync+0x34/0x1f0 [nfs]
[ 3000.662704]  vfs_fsync_range+0x3d/0xb0
[ 3000.663065]  vfs_fsync+0x1c/0x20
[ 3000.663385]  nfs4_file_flush+0x57/0x80 [nfsv4]
[ 3000.663813]  filp_close+0x2f/0x70
[ 3000.664132]  __close_fd+0x9a/0xc0
[ 3000.664453]  SyS_close+0x23/0x50
[ 3000.664785]  do_syscall_64+0x67/0x180
[ 3000.665162]  entry_SYSCALL64_slow_path+0x25/0x25
[ 3000.665600] RIP: 0033:0x7fac4d0e1e90
[ 3000.665946] RSP: 002b:7ffd54e90c88 EFLAGS: 0246 ORIG_RAX: 
0003
[ 3000.79] RAX: ffda RBX: 7fac4d3b5400 RCX: 7fac4d0e1e90
[ 3000.667349] RDX:  RSI: 7fac4d5d9000 RDI: 0001
[ 3000.668031] RBP:  R08: 7fac4d3b6a00 R09: 7fac4d5cd740
[ 3000.668709] R10: 7ffd54e909e0 R11: 0246 R12: 
[ 3000.669385] R13: 7fac4d3b5e80 R14:  R15: 
[ 3000.670061] Code: 00 00 66 66 66 66 90 55 48 85 ff 48 89 e5 41 56 41 55 41 
54 53 48 89 fb 0f 84 97 00 00 00 f6 05 16 8f bc ff 10 0f 85 a6 00 00 00 <4c> 8b 
63 48 48 8d 7b 38 49 8b 84 24 90 00 00 00 4c 8d a8 88 00
[ 3000.671831] RIP: pnfs_put_lseg+0x29/0x100 [nfsv4] RSP: c9ff39b8
[ 3000.672462] CR2: 003c

Signed-off-by: Artem Savkov <asav...@redhat.com>
---
 fs/nfs/filelayout/filelayout.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index acd30ba..a53d1b7 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -924,8 +924,6 @@ fl_pnfs_update_layout(struct inode *ino,
if (status)

[PATCH] nfs/filelayout: fix NULL pointer dereference in fl_pnfs_update_layout()

2017-04-21 Thread Artem Savkov
Calling pnfs_put_lset on an IS_ERR pointer results in a NULL pointer
dereference like the one below. fl_pnfs_update_layout()'s output is
checked after each call so it doesn't seem that it should try to handle
these errors on it's own.

[ 3000.636161] BUG: unable to handle kernel NULL pointer dereference at 
003c
[ 3000.636970] IP: pnfs_put_lseg+0x29/0x100 [nfsv4]
[ 3000.637420] PGD 4f23b067
[ 3000.637421] PUD 4a0f4067
[ 3000.637679] PMD 0
[ 3000.637937]
[ 3000.638287] Oops:  [#1] SMP
[ 3000.638591] Modules linked in: nfs_layout_nfsv41_files nfsv3 nfnetlink_queue 
nfnetlink_log nfnetlink bluetooth rfkill rpcsec_gss_krb5 nfsv4 nfs fscache 
binfmt_misc arc4 md4 nls_utf8 cifs ccm dns_resolver rpcrdma ib_isert 
iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt 
target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad 
ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 
nf_conntrack crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr 
virtio_balloon ppdev virtio_rng parport_pc i2c_piix4 parport acpi_cpufreq nfsd 
auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c ata_generic pata_acpi 
virtio_blk virtio_net cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt 
fb_sys_fops crc32c_intel ata_piix ttm libata drm serio_raw
[ 3000.645245]  i2c_core virtio_pci virtio_ring virtio floppy dm_mirror 
dm_region_hash dm_log dm_mod [last unloaded: xt_u32]
[ 3000.646360] CPU: 1 PID: 26402 Comm: date Not tainted 
4.11.0-rc7.1.el7.test.x86_64 #1
[ 3000.647092] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 3000.647638] task: 8800415ada00 task.stack: c9ff
[ 3000.648207] RIP: 0010:pnfs_put_lseg+0x29/0x100 [nfsv4]
[ 3000.648696] RSP: 0018:c9ff39b8 EFLAGS: 00010246
[ 3000.649193] RAX:  RBX: fff4 RCX: 000d43be
[ 3000.649859] RDX: 000d43bd RSI:  RDI: fff4
[ 3000.650530] RBP: c9ff39d8 R08: 0001e320 R09: a05c35ce
[ 3000.651203] R10: 88007fd1e320 R11: ea0001283d80 R12: 01400040
[ 3000.651875] R13: 88004f77d9f0 R14: c9ff3cd8 R15: 8800417ade00
[ 3000.652546] FS:  7fac4d5cd740() GS:88007fd0() 
knlGS:
[ 3000.653304] CS:  0010 DS:  ES:  CR0: 80050033
[ 3000.653849] CR2: 003c CR3: 4f08 CR4: 000406e0
[ 3000.654527] Call Trace:
[ 3000.654771]  fl_pnfs_update_layout.constprop.20+0x10c/0x150 
[nfs_layout_nfsv41_files]
[ 3000.655505]  filelayout_pg_init_write+0x21d/0x270 [nfs_layout_nfsv41_files]
[ 3000.656195]  __nfs_pageio_add_request+0x11c/0x490 [nfs]
[ 3000.656698]  nfs_pageio_add_request+0xac/0x260 [nfs]
[ 3000.657180]  nfs_do_writepage+0x109/0x2e0 [nfs]
[ 3000.657616]  nfs_writepages_callback+0x16/0x30 [nfs]
[ 3000.658096]  write_cache_pages+0x26f/0x510
[ 3000.658495]  ? nfs_do_writepage+0x2e0/0x2e0 [nfs]
[ 3000.658946]  ? _raw_spin_unlock_bh+0x1e/0x20
[ 3000.659357]  ? wb_wakeup_delayed+0x5f/0x70
[ 3000.659748]  ? __mark_inode_dirty+0x2eb/0x360
[ 3000.660170]  nfs_writepages+0x84/0xd0 [nfs]
[ 3000.660575]  ? nfs_updatepage+0x571/0xb70 [nfs]
[ 3000.661012]  do_writepages+0x1e/0x30
[ 3000.661358]  __filemap_fdatawrite_range+0xc6/0x100
[ 3000.661819]  filemap_write_and_wait_range+0x41/0x90
[ 3000.662292]  nfs_file_fsync+0x34/0x1f0 [nfs]
[ 3000.662704]  vfs_fsync_range+0x3d/0xb0
[ 3000.663065]  vfs_fsync+0x1c/0x20
[ 3000.663385]  nfs4_file_flush+0x57/0x80 [nfsv4]
[ 3000.663813]  filp_close+0x2f/0x70
[ 3000.664132]  __close_fd+0x9a/0xc0
[ 3000.664453]  SyS_close+0x23/0x50
[ 3000.664785]  do_syscall_64+0x67/0x180
[ 3000.665162]  entry_SYSCALL64_slow_path+0x25/0x25
[ 3000.665600] RIP: 0033:0x7fac4d0e1e90
[ 3000.665946] RSP: 002b:7ffd54e90c88 EFLAGS: 0246 ORIG_RAX: 
0003
[ 3000.79] RAX: ffda RBX: 7fac4d3b5400 RCX: 7fac4d0e1e90
[ 3000.667349] RDX:  RSI: 7fac4d5d9000 RDI: 0001
[ 3000.668031] RBP:  R08: 7fac4d3b6a00 R09: 7fac4d5cd740
[ 3000.668709] R10: 7ffd54e909e0 R11: 0246 R12: 
[ 3000.669385] R13: 7fac4d3b5e80 R14:  R15: 
[ 3000.670061] Code: 00 00 66 66 66 66 90 55 48 85 ff 48 89 e5 41 56 41 55 41 
54 53 48 89 fb 0f 84 97 00 00 00 f6 05 16 8f bc ff 10 0f 85 a6 00 00 00 <4c> 8b 
63 48 48 8d 7b 38 49 8b 84 24 90 00 00 00 4c 8d a8 88 00
[ 3000.671831] RIP: pnfs_put_lseg+0x29/0x100 [nfsv4] RSP: c9ff39b8
[ 3000.672462] CR2: 003c

Signed-off-by: Artem Savkov 
---
 fs/nfs/filelayout/filelayout.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index acd30ba..a53d1b7 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -924,8 +924,6 @@ fl_pnfs_update_layout(struct inode *ino,
if (status)
lseg = ERR_PTR(status)

  1   2   3   >