from:"Thadeu Lima de Souza Cascardo"

[PATCH 5.4,4.19 2/2] tracing: Increase PERF_MAX_TRACE_SIZE to handle Sentinel1 and docker together

2024-04-24 Thread Thadeu Lima de Souza Cascardo

From: "Robin H. Johnson" 

commit e531e90b5ab0f7ce5ff298e165214c1aec6ed187 upstream.

Running endpoint security solutions like Sentinel1 that use perf-based
tracing heavily lead to this repeated dump complaining about dockerd.
The default value of 2048 is nowhere near not large enough.

Using the prior patch "tracing: show size of requested buffer", we get
"perf buffer not large enough, wanted 6644, have 6144", after repeated
up-sizing (I did 2/4/6/8K). With 8K, the problem doesn't occur at all,
so below is the trace for 6K.

I'm wondering if this value should be selectable at boot time, but this
is a good starting point.

```
[ cut here ]
perf buffer not large enough, wanted 6644, have 6144
WARNING: CPU: 1 PID: 4997 at kernel/trace/trace_event_perf.c:402 
perf_trace_buf_alloc+0x8c/0xa0
Modules linked in: [..]
CPU: 1 PID: 4997 Comm: sh Tainted: GT 
5.13.13-x86_64-00039-gb3959163488e #63
Hardware name: LENOVO 20KH002JUS/20KH002JUS, BIOS N23ET66W (1.41 ) 09/02/2019
RIP: 0010:perf_trace_buf_alloc+0x8c/0xa0
Code: 80 3d 43 97 d0 01 00 74 07 31 c0 5b 5d 41 5c c3 ba 00 18 00 00 89 ee 48 
c7 c7 00 82 7d 91 c6 05 25 97 d0 01 01 e8 22 ee bc 00 <0f> 0b 31 c0 eb db 66 66 
2e 0f 1f 84 00 00 00 00 00 0f 1f 00 55 89
RSP: 0018:b922026b7d58 EFLAGS: 00010282
RAX:  RBX: 9da5ee012000 RCX: 0027
RDX: 9da881657828 RSI: 0001 RDI: 9da881657820
RBP: 19f4 R08:  R09: b922026b7b80
R10: b922026b7b78 R11: 91dda688 R12: 000f
R13: 9da5ee012108 R14: 9da8816570a0 R15: b922026b7e30
FS:  7f420db1a080() GS:9da88164() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0060 CR3: 0002504a8006 CR4: 003706e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 kprobe_perf_func+0x11e/0x270
 ? do_execveat_common.isra.0+0x1/0x1c0
 ? do_execveat_common.isra.0+0x5/0x1c0
 kprobe_ftrace_handler+0x10e/0x1d0
 0xc03aa0c8
 ? do_execveat_common.isra.0+0x1/0x1c0
 do_execveat_common.isra.0+0x5/0x1c0
 __x64_sys_execve+0x33/0x40
 do_syscall_64+0x6b/0xc0
 ? do_syscall_64+0x11/0xc0
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f420dc1db37
Code: ff ff 76 e7 f7 d8 64 41 89 00 eb df 0f 1f 80 00 00 00 00 f7 d8 64 41 89 
00 eb dc 0f 1f 84 00 00 00 00 00 b8 3b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 
c3 48 8b 0d 01 43 0f 00 f7 d8 64 89 01 48
RSP: 002b:7ffd4e8b4e38 EFLAGS: 0246 ORIG_RAX: 003b
RAX: ffda RBX:  RCX: 7f420dc1db37
RDX: 564338d1e740 RSI: 564338d32d50 RDI: 564338d28f00
RBP: 564338d28f00 R08: 564338d32d50 R09: 0020
R10: 01b6 R11: 0246 R12: 564338d28f00
R13: 564338d32d50 R14: 564338d1e740 R15: 564338d28c60
---[ end trace 83ab3e8e16275e49 ]---
```

Link: https://lkml.kernel.org/r/20210831043723.13481-2-robb...@gentoo.org

Signed-off-by: Robin H. Johnson 
Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Thadeu Lima de Souza Cascardo 
---
 include/linux/trace_events.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index b8b87e7ba93f..12ee8973ea6f 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -427,7 +427,7 @@ struct trace_event_file {
}   \
early_initcall(trace_init_perf_perm_##name);
 
-#define PERF_MAX_TRACE_SIZE2048
+#define PERF_MAX_TRACE_SIZE8192
 
 #define MAX_FILTER_STR_VAL 256 /* Should handle KSYM_SYMBOL_LEN */
 
-- 
2.34.1

[PATCH 5.15,5.10 2/2] tracing: Increase PERF_MAX_TRACE_SIZE to handle Sentinel1 and docker together

2024-04-24 Thread Thadeu Lima de Souza Cascardo

From: "Robin H. Johnson" 

commit e531e90b5ab0f7ce5ff298e165214c1aec6ed187 upstream.

Running endpoint security solutions like Sentinel1 that use perf-based
tracing heavily lead to this repeated dump complaining about dockerd.
The default value of 2048 is nowhere near not large enough.

Using the prior patch "tracing: show size of requested buffer", we get
"perf buffer not large enough, wanted 6644, have 6144", after repeated
up-sizing (I did 2/4/6/8K). With 8K, the problem doesn't occur at all,
so below is the trace for 6K.

I'm wondering if this value should be selectable at boot time, but this
is a good starting point.

```
[ cut here ]
perf buffer not large enough, wanted 6644, have 6144
WARNING: CPU: 1 PID: 4997 at kernel/trace/trace_event_perf.c:402 
perf_trace_buf_alloc+0x8c/0xa0
Modules linked in: [..]
CPU: 1 PID: 4997 Comm: sh Tainted: GT 
5.13.13-x86_64-00039-gb3959163488e #63
Hardware name: LENOVO 20KH002JUS/20KH002JUS, BIOS N23ET66W (1.41 ) 09/02/2019
RIP: 0010:perf_trace_buf_alloc+0x8c/0xa0
Code: 80 3d 43 97 d0 01 00 74 07 31 c0 5b 5d 41 5c c3 ba 00 18 00 00 89 ee 48 
c7 c7 00 82 7d 91 c6 05 25 97 d0 01 01 e8 22 ee bc 00 <0f> 0b 31 c0 eb db 66 66 
2e 0f 1f 84 00 00 00 00 00 0f 1f 00 55 89
RSP: 0018:b922026b7d58 EFLAGS: 00010282
RAX:  RBX: 9da5ee012000 RCX: 0027
RDX: 9da881657828 RSI: 0001 RDI: 9da881657820
RBP: 19f4 R08:  R09: b922026b7b80
R10: b922026b7b78 R11: 91dda688 R12: 000f
R13: 9da5ee012108 R14: 9da8816570a0 R15: b922026b7e30
FS:  7f420db1a080() GS:9da88164() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0060 CR3: 0002504a8006 CR4: 003706e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 kprobe_perf_func+0x11e/0x270
 ? do_execveat_common.isra.0+0x1/0x1c0
 ? do_execveat_common.isra.0+0x5/0x1c0
 kprobe_ftrace_handler+0x10e/0x1d0
 0xc03aa0c8
 ? do_execveat_common.isra.0+0x1/0x1c0
 do_execveat_common.isra.0+0x5/0x1c0
 __x64_sys_execve+0x33/0x40
 do_syscall_64+0x6b/0xc0
 ? do_syscall_64+0x11/0xc0
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f420dc1db37
Code: ff ff 76 e7 f7 d8 64 41 89 00 eb df 0f 1f 80 00 00 00 00 f7 d8 64 41 89 
00 eb dc 0f 1f 84 00 00 00 00 00 b8 3b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 
c3 48 8b 0d 01 43 0f 00 f7 d8 64 89 01 48
RSP: 002b:7ffd4e8b4e38 EFLAGS: 0246 ORIG_RAX: 003b
RAX: ffda RBX:  RCX: 7f420dc1db37
RDX: 564338d1e740 RSI: 564338d32d50 RDI: 564338d28f00
RBP: 564338d28f00 R08: 564338d32d50 R09: 0020
R10: 01b6 R11: 0246 R12: 564338d28f00
R13: 564338d32d50 R14: 564338d1e740 R15: 564338d28c60
---[ end trace 83ab3e8e16275e49 ]---
```

Link: https://lkml.kernel.org/r/20210831043723.13481-2-robb...@gentoo.org

Signed-off-by: Robin H. Johnson 
Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Thadeu Lima de Souza Cascardo 
---
 include/linux/trace_events.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index d3cbe4bf4fab..17575aa2a53c 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -676,7 +676,7 @@ struct trace_event_file {
}   \
early_initcall(trace_init_perf_perm_##name);
 
-#define PERF_MAX_TRACE_SIZE2048
+#define PERF_MAX_TRACE_SIZE8192
 
 #define MAX_FILTER_STR_VAL 256U/* Should handle KSYM_SYMBOL_LEN */
 
-- 
2.34.1

[PATCH 5.15,5.10,5.4,4.19 0/2] Fix warning when tracing with large filenames

2024-04-24 Thread Thadeu Lima de Souza Cascardo

The warning described on patch "tracing: Increase PERF_MAX_TRACE_SIZE to
handle Sentinel1 and docker together" can be triggered with a perf probe on
do_execve with a large path. As PATH_MAX is larger than PERF_MAX_TRACE_SIZE
(2048 before the patch), the warning will trigger.

The fix was included in 5.16, so backporting to 5.15 and earlier LTS
kernels. Also included is a patch that better describes the attempted
allocation size.

-- 
2.34.1

[PATCH 5.15,5.10,5.4,4.19 1/2] tracing: Show size of requested perf buffer

2024-04-24 Thread Thadeu Lima de Souza Cascardo

From: "Robin H. Johnson" 

commit a90afe8d020da9298c98fddb19b7a6372e2feb45 upstream.

If the perf buffer isn't large enough, provide a hint about how large it
needs to be for whatever is running.

Link: https://lkml.kernel.org/r/20210831043723.13481-1-robb...@gentoo.org

Signed-off-by: Robin H. Johnson 
Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Thadeu Lima de Souza Cascardo 
---
 kernel/trace/trace_event_perf.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 083f648e3265..61e3a2620fa3 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -401,7 +401,8 @@ void *perf_trace_buf_alloc(int size, struct pt_regs **regs, 
int *rctxp)
BUILD_BUG_ON(PERF_MAX_TRACE_SIZE % sizeof(unsigned long));
 
if (WARN_ONCE(size > PERF_MAX_TRACE_SIZE,
- "perf buffer not large enough"))
+ "perf buffer not large enough, wanted %d, have %d",
+ size, PERF_MAX_TRACE_SIZE))
return NULL;
 
*rctxp = rctx = perf_swevent_get_recursion_context();
-- 
2.34.1

Re: [PATCH] platform: x86: Typo fix in the file classmate-laptop.c

2021-03-17 Thread Thadeu Lima de Souza Cascardo

On Wed, Mar 17, 2021 at 02:13:43PM +0530, Bhaskar Chowdhury wrote:
> 
> s/derefence/dereference/
> 
> Signed-off-by: Bhaskar Chowdhury 

Acked-by: Thadeu Lima de Souza Cascardo 

> ---
>  drivers/platform/x86/classmate-laptop.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/platform/x86/classmate-laptop.c 
> b/drivers/platform/x86/classmate-laptop.c
> index 3e03e8d3a07f..9309ab5792cb 100644
> --- a/drivers/platform/x86/classmate-laptop.c
> +++ b/drivers/platform/x86/classmate-laptop.c
> @@ -956,7 +956,7 @@ static int cmpc_ipml_add(struct acpi_device *acpi)
>   /*
>* If RFKILL is disabled, rfkill_alloc will return ERR_PTR(-ENODEV).
>* This is OK, however, since all other uses of the device will not
> -  * derefence it.
> +  * dereference it.
>*/
>   if (ipml->rf) {
>   retval = rfkill_register(ipml->rf);
> --
> 2.30.2
>

Re: [PATCH v4] usb: gadget: configfs: Fix KASAN use-after-free

2021-03-11 Thread Thadeu Lima de Souza Cascardo

On Thu, Mar 11, 2021 at 02:53:52PM +0800, Macpaul Lin wrote:
> On Thu, 2021-03-11 at 14:42 +0800, Macpaul Lin wrote:
> > From: Jim Lin 
> > 
> > When gadget is disconnected, running sequence is like this.
> > . composite_disconnect
> > . Call trace:
> >   usb_string_copy+0xd0/0x128
> >   gadget_config_name_configuration_store+0x4
> >   gadget_config_name_attr_store+0x40/0x50
> >   configfs_write_file+0x198/0x1f4
> >   vfs_write+0x100/0x220
> >   SyS_write+0x58/0xa8
> > . configfs_composite_unbind
> > . configfs_composite_bind
> > 
> > In configfs_composite_bind, it has
> > "cn->strings.s = cn->configuration;"
> > 
> > When usb_string_copy is invoked. it would
> > allocate memory, copy input string, release previous pointed memory space,
> > and use new allocated memory.
> > 
> > When gadget is connected, host sends down request to get information.
> > Call trace:
> >   usb_gadget_get_string+0xec/0x168
> >   lookup_string+0x64/0x98
> >   composite_setup+0xa34/0x1ee8
> > 
> > If gadget is disconnected and connected quickly, in the failed case,
> > cn->configuration memory has been released by usb_string_copy kfree but
> > configfs_composite_bind hasn't been run in time to assign new allocated
> > "cn->configuration" pointer to "cn->strings.s".
> > 
> > When "strlen(s->s) of usb_gadget_get_string is being executed, the dangling
> > memory is accessed, "BUG: KASAN: use-after-free" error occurs.
> > 
> > Signed-off-by: Jim Lin 
> > Signed-off-by: Macpaul Lin 
> > Cc: sta...@vger.kernel.org
> > ---
> > Changes in v2:
> > Changes in v3:
> >  - Change commit description
> > Changes in v4:
> >  - Fix build error and adapt patch to kernel-5.12-rc1.
> >Replace definition "MAX_USB_STRING_WITH_NULL_LEN" with
> >"USB_MAX_STRING_WITH_NULL_LEN".
> >  - Note: The patch v2 and v3 has been verified by
> >Thadeu Lima de Souza Cascardo 
> >http://spinics.net/lists/kernel/msg3840792.html
> 
> Dear Cascardo,
> 
> Would you please help to confirm if you've tested it on Linux PC,
> Chrome OS, or an Android OS?

I tested v3 on Ubuntu GNU/Linux. I will test v4.

Cascardo.

> 
> Thanks!
> Macpaul Lin
> 
> >and
> >Macpaul Lin  on Android kernels.
> >http://lkml.org/lkml/2020/6/11/8
> >  - The patch is suggested to be applied to LTS versions.
> > 
> >  drivers/usb/gadget/configfs.c |   14 ++
> >  1 file changed, 10 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/usb/gadget/configfs.c b/drivers/usb/gadget/configfs.c
> > index 0d56f33..15a607c 100644
> > --- a/drivers/usb/gadget/configfs.c
> > +++ b/drivers/usb/gadget/configfs.c
> > @@ -97,6 +97,8 @@ struct gadget_config_name {
> > struct list_head list;
> >  };
> >  
> > +#define USB_MAX_STRING_WITH_NULL_LEN   (USB_MAX_STRING_LEN+1)
> > +
> >  static int usb_string_copy(const char *s, char **s_copy)
> >  {
> > int ret;
> > @@ -106,12 +108,16 @@ static int usb_string_copy(const char *s, char 
> > **s_copy)
> > if (ret > USB_MAX_STRING_LEN)
> > return -EOVERFLOW;
> >  
> > -   str = kstrdup(s, GFP_KERNEL);
> > -   if (!str)
> > -   return -ENOMEM;
> > +   if (copy) {
> > +   str = copy;
> > +   } else {
> > +   str = kmalloc(USB_MAX_STRING_WITH_NULL_LEN, GFP_KERNEL);
> > +   if (!str)
> > +   return -ENOMEM;
> > +   }
> > +   strcpy(str, s);
> > if (str[ret - 1] == '\n')
> > str[ret - 1] = '\0';
> > -   kfree(copy);
> > *s_copy = str;
> > return 0;
> >  }
>

[PATCH] powerpc/perf: prevent mixed EBB and non-EBB events

2021-02-24 Thread Thadeu Lima de Souza Cascardo

EBB events must be under exclusive groups, so there is no mix of EBB and
non-EBB events on the same PMU. This requirement worked fine as perf core
would not allow other pinned events to be scheduled together with exclusive
events.

This assumption was broken by commit 1908dc911792 ("perf: Tweak
perf_event_attr::exclusive semantics").

After that, the test cpu_event_pinned_vs_ebb_test started succeeding after
read_events, but worse, the task would not have given access to PMC1, so
when it tried to write to it, it was killed with "illegal instruction".

Preventing mixed EBB and non-EBB events from being add to the same PMU will
just revert to the previous behavior and the test will succeed.

Fixes: 1908dc911792 (perf: Tweak perf_event_attr::exclusive semantics)
Signed-off-by: Thadeu Lima de Souza Cascardo 
---
 arch/powerpc/perf/core-book3s.c | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 43599e671d38..d767f7944f85 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -1010,9 +1010,25 @@ static int check_excludes(struct perf_event **ctrs, 
unsigned int cflags[],
  int n_prev, int n_new)
 {
int eu = 0, ek = 0, eh = 0;
+   bool ebb = false;
int i, n, first;
struct perf_event *event;
 
+   n = n_prev + n_new;
+   if (n <= 1)
+   return 0;
+
+   first = 1;
+   for (i = 0; i < n; ++i) {
+   event = ctrs[i];
+   if (first) {
+   ebb = is_ebb_event(event);
+   first = 0;
+   } else if (is_ebb_event(event) != ebb) {
+   return -EAGAIN;
+   }
+   }
+
/*
 * If the PMU we're on supports per event exclude settings then we
 * don't need to do any of this logic. NB. This assumes no PMU has both
@@ -1021,10 +1037,6 @@ static int check_excludes(struct perf_event **ctrs, 
unsigned int cflags[],
if (ppmu->flags & PPMU_ARCH_207S)
return 0;
 
-   n = n_prev + n_new;
-   if (n <= 1)
-   return 0;
-
first = 1;
for (i = 0; i < n; ++i) {
if (cflags[i] & PPMU_LIMITED_PMC_OK) {
-- 
2.27.0

Re: [PATCH v3] usb: gadget: configfs: Fix KASAN use-after-free

2021-02-22 Thread Thadeu Lima de Souza Cascardo

On Tue, Jan 17, 2017 at 12:29:09PM +0200, Felipe Balbi wrote:
> 
> Hi,
> 
> Jim Lin  writes:
> > When gadget is disconnected, running sequence is like this.
> > . composite_disconnect
> > . Call trace:
> >   usb_string_copy+0xd0/0x128
> >   gadget_config_name_configuration_store+0x4
> >   gadget_config_name_attr_store+0x40/0x50
> >   configfs_write_file+0x198/0x1f4
> >   vfs_write+0x100/0x220
> >   SyS_write+0x58/0xa8
> > . configfs_composite_unbind
> > . configfs_composite_bind
> >
> > In configfs_composite_bind, it has
> > "cn->strings.s = cn->configuration;"
> >
> > When usb_string_copy is invoked. it would
> > allocate memory, copy input string, release previous pointed memory space,
> > and use new allocated memory.
> >
> > When gadget is connected, host sends down request to get information.
> > Call trace:
> >   usb_gadget_get_string+0xec/0x168
> >   lookup_string+0x64/0x98
> >   composite_setup+0xa34/0x1ee8
> >
> > If gadget is disconnected and connected quickly, in the failed case,
> > cn->configuration memory has been released by usb_string_copy kfree but
> > configfs_composite_bind hasn't been run in time to assign new allocated
> > "cn->configuration" pointer to "cn->strings.s".
> >
> > When "strlen(s->s) of usb_gadget_get_string is being executed, the dangling
> > memory is accessed, "BUG: KASAN: use-after-free" error occurs.
> >
> > Signed-off-by: Jim Lin 
> > ---
> > Changes in v2:
> > Changes in v3:
> >  Change commit description
> 
> well, I need to be sure you tested this with Linus' tree. The reason I'm
> asking is because this could be a bug caused by Android changes. From
> your previous patch, the problem started with android_setup().
> 
> Please test with v4.10-rc4 and any configfs-based gadget.
> 
> -- 
> balbi

I tested this with dummy_hcd on top of a 5.8 kernel and I got lsusb to respond
with an error instead of the right manufacturer string, after overwriting such
a string after binding.

With the patch applied, after the string is overwritten, lsusb will show the
updated string.

Because of commit 81c7462883b0cc0a4eeef0687f80ad5b5baee5f6 ("USB: replace
hardcode maximum usb string length by definition"), the patch will need a
fixup. Should I send a v2 with my sign-off?

Thanks.
Cascardo.

Re: [PATCH 1/2] dccp: ccid: move timers to struct dccp_sock

2020-11-10 Thread Thadeu Lima de Souza Cascardo

On Mon, Nov 09, 2020 at 02:15:53PM -0800, Jakub Kicinski wrote:
> On Mon, 9 Nov 2020 18:31:34 -0300 Thadeu Lima de Souza Cascardo wrote:
> > > Which paths are those (my memory of this code is waning)? I thought
> > > disconnect is only called from the user space side (shutdown syscall).
> > > The only other way to terminate the connection is to close the socket,
> > > which Eric already fixed by postponing the destruction of ccid in that
> > > case.  
> > 
> > dccp_v4_do_rcv -> dccp_rcv_established -> dccp_parse_options ->
> > dccp_feat_parse_options -> dccp_feat_handle_nn_established ->
> > dccp_feat_activate -> __dccp_feat_activate -> dccp_hdlr_ccid ->
> > ccid_hc_tx_delete
> 
> Well, that's not a disconnect path.
> 
> There should be no CCID on a disconnected socket, tho, right? Otherwise
> if we can switch from one active CCID to another then reusing a single
> timer in struct dccp_sock for both is definitely not safe as I
> explained in my initial email.

Yeah, I agree with your initial email. The patch I submitted for that fix needs
rework, which is what I tried and failed so far. I need to get back to some
testing of my latest fix and find out what needs fixing there.

But I am also saying that simply doing a del_timer_sync on disconnect paths
won't do, because there are non-disconnect paths where there is a CCID that we
will remove and replace and that will still trigger a timer UAF.

So I have been working on a fix that involves a refcnt on ccid itself. But I
want to test that it really fixes the problem and I have spent most of the time
finding out a way to trigger the timer in a race with the disconnect path.

And that same test has showed me that this timer UAF will happen regardless of
commit 2677d20677314101293e6da0094ede7b5526d2b1, which led me into stating that
reverting it should be done in any case.

I think I can find some time this week to work a little further on the fix for
the time UAF.

Thanks.
Cascardo.

Re: [PATCH 1/2] dccp: ccid: move timers to struct dccp_sock

2020-11-09 Thread Thadeu Lima de Souza Cascardo

On Mon, Nov 09, 2020 at 01:15:54PM -0800, Jakub Kicinski wrote:
> On Mon, 9 Nov 2020 18:09:09 -0300 Thadeu Lima de Souza Cascardo wrote:
> > On Mon, Nov 09, 2020 at 09:49:38AM -0800, Jakub Kicinski wrote:
> > > On Mon, 9 Nov 2020 08:48:28 -0300 Thadeu Lima de Souza Cascardo wrote:  
> > > > On Fri, Oct 16, 2020 at 03:30:16PM -0700, Jakub Kicinski wrote:  
> > > > > On Tue, 13 Oct 2020 19:18:48 +0200 Kleber Sacilotto de Souza wrote:   
> > > > >  
> > > > > > From: Thadeu Lima de Souza Cascardo 
> > > > > > 
> > > > > > When dccps_hc_tx_ccid is freed, ccid timers may still trigger. The 
> > > > > > reason
> > > > > > del_timer_sync can't be used is because this relies on keeping a 
> > > > > > reference
> > > > > > to struct sock. But as we keep a pointer to dccps_hc_tx_ccid and 
> > > > > > free that
> > > > > > during disconnect, the timer should really belong to struct 
> > > > > > dccp_sock.
> > > > > > 
> > > > > > This addresses CVE-2020-16119.
> > > > > > 
> > > > > > Fixes: 839a6094140a (net: dccp: Convert timers to use timer_setup())
> > > > > > Signed-off-by: Thadeu Lima de Souza Cascardo 
> > > > > > 
> > > > > > Signed-off-by: Kleber Sacilotto de Souza 
> > > > > > 
> > > > > 
> > > > > I've been mulling over this fix.
> > > > > 
> > > > > The layering violation really doesn't sit well.
> > > > > 
> > > > > We're reusing the timer object. What if we are really unlucky, the
> > > > > fires and gets blocked by a cosmic ray just as it's about to try to
> > > > > lock the socket, then user manages to reconnect, and timer starts
> > > > > again. Potentially with a different CCID algo altogether?
> > > > > 
> > > > > Is disconnect ever called under the BH lock?  Maybe plumb a bool
> > > > > argument through to ccid*_hc_tx_exit() and do a sk_stop_timer_sync()
> > > > > when called from disconnect()?
> > > > > 
> > > > > Or do refcounting on ccid_priv so that the timer holds both the socket
> > > > > and the priv?
> > > > 
> > > > Sorry about too late a response. I was on vacation, then came back and 
> > > > spent a
> > > > couple of days testing this further, and had to switch to other tasks.
> > > > 
> > > > So, while testing this, I had to resort to tricks like having a very 
> > > > small
> > > > expire and enqueuing on a different CPU. Then, after some minutes, I 
> > > > hit a UAF.
> > > > That's with or without the first of the second patch.
> > > > 
> > > > I also tried to refcount ccid instead of the socket, keeping the timer 
> > > > on the
> > > > ccid, but that still hit the UAF, and that's when I had to switch 
> > > > tasks.  
> > > 
> > > Hm, not instead, as well. I think trying cancel the timer _sync from
> > > the disconnect path would be the simplest solution, tho.
> > 
> > I don't think so. On other paths, we would still have the possibility that:
> > 
> > CPU1: timer expires and is about to run
> > CPU2: calls stop_timer (which does not stop anything) and frees ccid
> > CPU1: timer runs and uses freed ccid
> > 
> > And those paths, IIUC, may be run under a SoftIRQ on the receive path, so 
> > would
> > not be able to call stop_timer_sync.
> 
> Which paths are those (my memory of this code is waning)? I thought
> disconnect is only called from the user space side (shutdown syscall).
> The only other way to terminate the connection is to close the socket,
> which Eric already fixed by postponing the destruction of ccid in that
> case.

dccp_v4_do_rcv -> dccp_rcv_established -> dccp_parse_options ->
dccp_feat_parse_options -> dccp_feat_handle_nn_established ->
dccp_feat_activate -> __dccp_feat_activate -> dccp_hdlr_ccid ->
ccid_hc_tx_delete

> 
> > > > Oh, and in the meantime, I found one or two other fixes that we
> > > > should apply, will send them shortly.
> > > > 
> > > > But I would argue that we should apply the revert as it addresses the
> > > > CVE, without really regressing the other UAF, as I argued. Does that
> > > > make sense?  
> > > 
> > > We can - it's always a little strange to go from one bug to a different
> > > without a fix - but the justification being that while the previous UAF
> > > required a race condition the new one is actually worst because it can 
> > > be triggered reliably?  
> > 
> > Well, I am arguing here that commit 2677d20677314101293e6da0094ede7b5526d2b1
> > ("dccp: don't free ccid2_hc_tx_sock struct in dccp_disconnect()") doesn't
> > really fix anything. Whenever ccid_hx_tx_delete is called, that UAF might
> > happen, because the timer might trigger right after we free the ccid struct.
> > 
> > And, yes, on the other hand, we can reliably launch the DoS attack that is
> > fixed by the revert of that commit.
> 
> OK.
>

Re: [PATCH 1/2] dccp: ccid: move timers to struct dccp_sock

2020-11-09 Thread Thadeu Lima de Souza Cascardo

On Mon, Nov 09, 2020 at 09:49:38AM -0800, Jakub Kicinski wrote:
> On Mon, 9 Nov 2020 08:48:28 -0300 Thadeu Lima de Souza Cascardo wrote:
> > On Fri, Oct 16, 2020 at 03:30:16PM -0700, Jakub Kicinski wrote:
> > > On Tue, 13 Oct 2020 19:18:48 +0200 Kleber Sacilotto de Souza wrote:  
> > > > From: Thadeu Lima de Souza Cascardo 
> > > > 
> > > > When dccps_hc_tx_ccid is freed, ccid timers may still trigger. The 
> > > > reason
> > > > del_timer_sync can't be used is because this relies on keeping a 
> > > > reference
> > > > to struct sock. But as we keep a pointer to dccps_hc_tx_ccid and free 
> > > > that
> > > > during disconnect, the timer should really belong to struct dccp_sock.
> > > > 
> > > > This addresses CVE-2020-16119.
> > > > 
> > > > Fixes: 839a6094140a (net: dccp: Convert timers to use timer_setup())
> > > > Signed-off-by: Thadeu Lima de Souza Cascardo 
> > > > Signed-off-by: Kleber Sacilotto de Souza   
> > > 
> > > I've been mulling over this fix.
> > > 
> > > The layering violation really doesn't sit well.
> > > 
> > > We're reusing the timer object. What if we are really unlucky, the
> > > fires and gets blocked by a cosmic ray just as it's about to try to
> > > lock the socket, then user manages to reconnect, and timer starts
> > > again. Potentially with a different CCID algo altogether?
> > > 
> > > Is disconnect ever called under the BH lock?  Maybe plumb a bool
> > > argument through to ccid*_hc_tx_exit() and do a sk_stop_timer_sync()
> > > when called from disconnect()?
> > > 
> > > Or do refcounting on ccid_priv so that the timer holds both the socket
> > > and the priv?  
> > 
> > Sorry about too late a response. I was on vacation, then came back and 
> > spent a
> > couple of days testing this further, and had to switch to other tasks.
> > 
> > So, while testing this, I had to resort to tricks like having a very small
> > expire and enqueuing on a different CPU. Then, after some minutes, I hit a 
> > UAF.
> > That's with or without the first of the second patch.
> > 
> > I also tried to refcount ccid instead of the socket, keeping the timer on 
> > the
> > ccid, but that still hit the UAF, and that's when I had to switch tasks.
> 
> Hm, not instead, as well. I think trying cancel the timer _sync from
> the disconnect path would be the simplest solution, tho.
> 

I don't think so. On other paths, we would still have the possibility that:

CPU1: timer expires and is about to run
CPU2: calls stop_timer (which does not stop anything) and frees ccid
CPU1: timer runs and uses freed ccid

And those paths, IIUC, may be run under a SoftIRQ on the receive path, so would
not be able to call stop_timer_sync.

> > Oh, and in the meantime, I found one or two other fixes that we
> > should apply, will send them shortly.
> > 
> > But I would argue that we should apply the revert as it addresses the
> > CVE, without really regressing the other UAF, as I argued. Does that
> > make sense?
> 
> We can - it's always a little strange to go from one bug to a different
> without a fix - but the justification being that while the previous UAF
> required a race condition the new one is actually worst because it can 
> be triggered reliably?

Well, I am arguing here that commit 2677d20677314101293e6da0094ede7b5526d2b1
("dccp: don't free ccid2_hc_tx_sock struct in dccp_disconnect()") doesn't
really fix anything. Whenever ccid_hx_tx_delete is called, that UAF might
happen, because the timer might trigger right after we free the ccid struct.

And, yes, on the other hand, we can reliably launch the DoS attack that is
fixed by the revert of that commit.

Thanks.
Cascardo.

Re: [PATCH 1/2] dccp: ccid: move timers to struct dccp_sock

2020-11-09 Thread Thadeu Lima de Souza Cascardo

On Fri, Oct 16, 2020 at 03:30:16PM -0700, Jakub Kicinski wrote:
> On Tue, 13 Oct 2020 19:18:48 +0200 Kleber Sacilotto de Souza wrote:
> > From: Thadeu Lima de Souza Cascardo 
> > 
> > When dccps_hc_tx_ccid is freed, ccid timers may still trigger. The reason
> > del_timer_sync can't be used is because this relies on keeping a reference
> > to struct sock. But as we keep a pointer to dccps_hc_tx_ccid and free that
> > during disconnect, the timer should really belong to struct dccp_sock.
> > 
> > This addresses CVE-2020-16119.
> > 
> > Fixes: 839a6094140a (net: dccp: Convert timers to use timer_setup())
> > Signed-off-by: Thadeu Lima de Souza Cascardo 
> > Signed-off-by: Kleber Sacilotto de Souza 
> 
> I've been mulling over this fix.
> 
> The layering violation really doesn't sit well.
> 
> We're reusing the timer object. What if we are really unlucky, the
> fires and gets blocked by a cosmic ray just as it's about to try to
> lock the socket, then user manages to reconnect, and timer starts
> again. Potentially with a different CCID algo altogether?
> 
> Is disconnect ever called under the BH lock?  Maybe plumb a bool
> argument through to ccid*_hc_tx_exit() and do a sk_stop_timer_sync()
> when called from disconnect()?
> 
> Or do refcounting on ccid_priv so that the timer holds both the socket
> and the priv?

Sorry about too late a response. I was on vacation, then came back and spent a
couple of days testing this further, and had to switch to other tasks.

So, while testing this, I had to resort to tricks like having a very small
expire and enqueuing on a different CPU. Then, after some minutes, I hit a UAF.
That's with or without the first of the second patch.

I also tried to refcount ccid instead of the socket, keeping the timer on the
ccid, but that still hit the UAF, and that's when I had to switch tasks. Oh,
and in the meantime, I found one or two other fixes that we should apply, will
send them shortly.

But I would argue that we should apply the revert as it addresses the CVE,
without really regressing the other UAF, as I argued. Does that make sense?

Thank you.
Cascardo.

Re: [PATCH 1/2] dccp: ccid: move timers to struct dccp_sock

2020-10-15 Thread Thadeu Lima de Souza Cascardo

On Wed, Oct 14, 2020 at 08:43:22PM -0700, Jakub Kicinski wrote:
> On Tue, 13 Oct 2020 19:18:48 +0200 Kleber Sacilotto de Souza wrote:
> > From: Thadeu Lima de Souza Cascardo 
> > 
> > When dccps_hc_tx_ccid is freed, ccid timers may still trigger. The reason
> > del_timer_sync can't be used is because this relies on keeping a reference
> > to struct sock. But as we keep a pointer to dccps_hc_tx_ccid and free that
> > during disconnect, the timer should really belong to struct dccp_sock.
> > 
> > This addresses CVE-2020-16119.
> > 
> > Fixes: 839a6094140a (net: dccp: Convert timers to use timer_setup())
> 
> Presumably you chose this commit because the fix won't apply beyond it?
> But it really fixes 2677d2067731 (dccp: don't free.. right?

Well, it should also fix cases where dccps_hc_tx_ccid{,_private} has been freed
right after the timer is stopped.

So, we could add:
Fixes: 2a91aa396739 ([DCCP] CCID2: Initial CCID2 (TCP-Like) implementation)
Fixes: 7c657876b63c ([DCCP]: Initial implementation)

But I wouldn't say that this fixes 2677d2067731, unless there is argument to
say that it fixes it because it claimed to fix what is being fixed here. But
even the code that it removed was supposed to be stopping the timer, so how
could it ever fix what it was claiming to fix?

Thanks.
Cascardo.

> 
> > Signed-off-by: Thadeu Lima de Souza Cascardo 
> > Signed-off-by: Kleber Sacilotto de Souza

Re: [LTP] [PATCH] syscall/ptrace08: Simplify the test.

2020-09-04 Thread Thadeu Lima de Souza Cascardo

On Fri, Sep 04, 2020 at 01:58:17PM +0200, Cyril Hrubis wrote:
> The original test was attempting to crash the kernel by setting a
> breakpoint on do_debug kernel function which, when triggered, caused an
> infinite loop in the kernel. The problem with this approach is that
> kernel internal function names are not stable at all and the name was
> changed recently, which made the test fail for no good reason.
> 
> So this patch changes the test to read the breakpoint address back
> instead, which also means that we can drop the /proc/kallsyms parsing as
> well.
> 
> Signed-off-by: Cyril Hrubis 
> CC: Andy Lutomirski 
> CC: Peter Zijlstra 
> CC: Thomas Gleixner 
> CC: Alexandre Chartre 

Hi, Cyril.

This is failing on our 4.4 and 4.15 kernels, though they passed with the
previous test and have commit f67b15037a7a applied.

So, this is dependent on some other behavior/commit that has changed. It passes
on 5.4, for example. I'll try to investigate further.

Cascardo.

> ---
>  testcases/kernel/syscalls/ptrace/ptrace08.c | 120 ++--
>  1 file changed, 60 insertions(+), 60 deletions(-)
> 
> diff --git a/testcases/kernel/syscalls/ptrace/ptrace08.c 
> b/testcases/kernel/syscalls/ptrace/ptrace08.c
> index 591aa0dd2..5587f0bbb 100644
> --- a/testcases/kernel/syscalls/ptrace/ptrace08.c
> +++ b/testcases/kernel/syscalls/ptrace/ptrace08.c
> @@ -5,8 +5,17 @@
>   *
>   * CVE-2018-1000199
>   *
> - * Test error handling when ptrace(POKEUSER) modifies debug registers.
> - * Even if the call returns error, it may create breakpoint in kernel code.
> + * Test error handling when ptrace(POKEUSER) modified x86 debug registers 
> even
> + * when the call returned error.
> + *
> + * When the bug was present we could create breakpoint in the kernel code,
> + * which shoudn't be possible at all. The original CVE caused a kernel crash 
> by
> + * setting a breakpoint on do_debug kernel function which, when triggered,
> + * caused an infinite loop. However we do not have to crash the kernel in 
> order
> + * to assert if kernel has been fixed or not. All we have to do is to try to
> + * set a breakpoint, on any kernel address, then read it back and check if 
> the
> + * value has been set or not.
> + *
>   * Kernel crash partially fixed in:
>   *
>   *  commit f67b15037a7a50c57f72e69a6d59941ad90a0f0f
> @@ -26,69 +35,42 @@
>  #include "tst_safe_stdio.h"
>  
>  #if defined(__i386__) || defined(__x86_64__)
> -#define SYMNAME_SIZE 256
> -#define KERNEL_SYM "do_debug"
>  
> -static unsigned long break_addr;
>  static pid_t child_pid;
>  
> -static void setup(void)
> -{
> - int fcount;
> - char endl, symname[256];
> - FILE *fr = SAFE_FOPEN("/proc/kallsyms", "r");
> -
> - /* Find address of do_debug() in /proc/kallsyms */
> - do {
> - fcount = fscanf(fr, "%lx %*c %255s%c", _addr, symname,
> - );
> -
> - if (fcount <= 0 && feof(fr))
> - break;
> -
> - if (fcount < 2) {
> - fclose(fr);
> - tst_brk(TBROK, "Unexpected data in /proc/kallsyms %d",
> - fcount);
> - }
> -
> - if (fcount >= 3 && endl != '\n')
> - while (!feof(fr) && fgetc(fr) != '\n');
> - } while (!feof(fr) && strcmp(symname, KERNEL_SYM));
> -
> - SAFE_FCLOSE(fr);
> -
> - if (strcmp(symname, KERNEL_SYM))
> - tst_brk(TBROK, "Cannot find address of kernel symbol \"%s\"",
> - KERNEL_SYM);
> -
> - if (!break_addr)
> - tst_brk(TCONF, "Addresses in /proc/kallsyms are hidden");
> -
> - tst_res(TINFO, "Kernel symbol \"%s\" found at 0x%lx", KERNEL_SYM,
> - break_addr);
> -}
> +#if defined(__x86_64__)
> +# define KERN_ADDR_MIN 0x8000
> +# define KERN_ADDR_MAX 0x
> +# define KERN_ADDR_BITS 64
> +#elif defined(__i386__)
> +# define KERN_ADDR_MIN 0xc000
> +# define KERN_ADDR_MAX 0x
> +# define KERN_ADDR_BITS 32
> +#endif
>  
> -static void debug_trap(void)
> +static void setup(void)
>  {
> - /* x86 instruction INT1 */
> - asm volatile (".byte 0xf1");
> + /*
> +  * When running in compat mode we can't pass 64 address to ptrace so we
> +  * have to skip the test.
> +  */
> + if (tst_kernel_bits() != KERN_ADDR_BITS)
> + tst_brk(TCONF, "Cannot pass 64bit kernel address in compat 
> mode");
>  }
>  
>  static void child_main(void)
>  {
>   raise(SIGSTOP);
> - /* wait for SIGCONT from parent */
> - debug_trap();
>   exit(0);
>  }
>  
> -static void run(void)
> +static void ptrace_try_kern_addr(unsigned long kern_addr)
>  {
>   int status;
> - pid_t child;
>  
> - child = child_pid = SAFE_FORK();
> + tst_res(TINFO, "Trying address 0x%lx", kern_addr);
> +
> + child_pid = SAFE_FORK();
>  
>   if (!child_pid)
>   child_main();
> @@ -103,22 +85,41 @@ static void run(void)
>

Re: [PATCH] net/scm: Fix typo in SCM_RIGHTS compat refactoring

2020-08-07 Thread Thadeu Lima de Souza Cascardo

On Fri, Aug 07, 2020 at 11:20:05AM -0700, Kees Cook wrote:
> When refactoring the SCM_RIGHTS code, I accidentally mis-merged my
> native/compat diffs, which entirely broke using SCM_RIGHTS in compat
> mode. Use the correct helper.
> 
> Reported-by: Christian Zigotzky 
> Link: https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-August/216156.html
> Reported-by: "Alex Xu (Hello71)" 
> Link: https://lore.kernel.org/lkml/1596812929.lz7fuo8r2w.none@localhost/
> Suggested-by: Thadeu Lima de Souza Cascardo 
> Fixes: c0029de50982 ("net/scm: Regularize compat handling of 
> scm_detach_fds()")
> Signed-off-by: Kees Cook 

Hi, Kees.

You might want to add the Teste-by line that Alex Xu sent. And if my ack adds
any value on top of Suggested-by, here it goes:

Acked-by: Thadeu Lima de Souza Cascardo 

Thanks.
Cascardo.

> ---
>  net/compat.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/compat.c b/net/compat.c
> index 703acb51c698..95ce707a30a3 100644
> --- a/net/compat.c
> +++ b/net/compat.c
> @@ -294,7 +294,7 @@ void scm_detach_fds_compat(struct msghdr *msg, struct 
> scm_cookie *scm)
>   (struct compat_cmsghdr __user *)msg->msg_control;
>   unsigned int o_flags = (msg->msg_flags & MSG_CMSG_CLOEXEC) ? O_CLOEXEC 
> : 0;
>   int fdmax = min_t(int, scm_max_fds_compat(msg), scm->fp->count);
> - int __user *cmsg_data = CMSG_USER_DATA(cm);
> + int __user *cmsg_data = CMSG_COMPAT_DATA(cm);
>   int err = 0, i;
>  
>   for (i = 0; i < fdmax; i++) {
> -- 
> 2.25.1
> 
> 
> -- 
> Kees Cook

Re: wine fails to start with seccomp updates for v5.9-rc1

2020-08-07 Thread Thadeu Lima de Souza Cascardo

On Fri, Aug 07, 2020 at 08:48:46AM -0700, Linus Torvalds wrote:
> On Fri, Aug 7, 2020 at 8:19 AM Alex Xu (Hello71)  wrote:
> >
> > On Linus' master, wine fails to start with the following error:
> >
> > wine client error:0: write: Bad file descriptor
> >
> > This issue is not present on 5.8. It appears to be caused by failure to
> > write to a pipe FD received via SCM_RIGHTS. Therefore, I tried reverting
> > 9ecc6ea491f0, which resolved the issue.
> 
> Would you mind trying to bisect exactly where it happens?
> 

This report [1] seemed related and pointed out at c0029de50982 ("net/scm:
Regularize compat handling of scm_detach_fds()"). The use of CMSG_USER_DATA
instead of CMSG_COMPAT_DATA seems fishy.

Alex, can you try applying the patch below?

Cascardo.

[1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-August/216156.html

> I don't think any of the commits in that pull are supposed to change
> semantics, and while reverting the whole merge shows that yes, that's
> what brought in the problems, it would be good to pinpoint just which
> change breaks so that we can fix just that thing.
> 
> Kees, ideas?
> 
>  Linus

---
diff --git a/net/compat.c b/net/compat.c
index 703acb51c698..95ce707a30a3 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -294,7 +294,7 @@ void scm_detach_fds_compat(struct msghdr *msg, struct 
scm_cookie *scm)
(struct compat_cmsghdr __user *)msg->msg_control;
unsigned int o_flags = (msg->msg_flags & MSG_CMSG_CLOEXEC) ? O_CLOEXEC 
: 0;
int fdmax = min_t(int, scm_max_fds_compat(msg), scm->fp->count);
-   int __user *cmsg_data = CMSG_USER_DATA(cm);
+   int __user *cmsg_data = CMSG_COMPAT_DATA(cm);
int err = 0, i;
 
for (i = 0; i < fdmax; i++) {

[PATCH] selftests/powerpc: return skip code for spectre_v2

2020-07-28 Thread Thadeu Lima de Souza Cascardo

When running under older versions of qemu of under newer versions with old
machine types, some security features will not be reported to the guest.
This will lead the guest OS to consider itself Vulnerable to spectre_v2.

So, spectre_v2 test fails in such cases when the host is mitigated and miss
predictions cannot be detected as expected by the test.

Make it return the skip code instead, for this particular case. We don't
want to miss the case when the test fails and the system reports as
mitigated or not affected. But it is not a problem to miss failures when
the system reports as Vulnerable.

Signed-off-by: Thadeu Lima de Souza Cascardo 
---
 tools/testing/selftests/powerpc/security/spectre_v2.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/tools/testing/selftests/powerpc/security/spectre_v2.c 
b/tools/testing/selftests/powerpc/security/spectre_v2.c
index 8c6b982af2a8..d5445bfd63ed 100644
--- a/tools/testing/selftests/powerpc/security/spectre_v2.c
+++ b/tools/testing/selftests/powerpc/security/spectre_v2.c
@@ -183,6 +183,14 @@ int spectre_v2_test(void)
if (miss_percent > 15) {
printf("Branch misses > 15%% unexpected in this 
configuration!\n");
printf("Possible mis-match between reported & actual 
mitigation\n");
+   /* Such a mismatch may be caused by a guest system
+* reporting as vulnerable when the host is mitigated.
+* Return skip code to avoid detecting this as an
+* error. We are not vulnerable and reporting otherwise,
+* so missing such a mismatch is safe.
+*/
+   if (state == VULNERABLE)
+   return 4;
return 1;
}
break;
-- 
2.25.1

[PATCH] nbd: allocate sufficient space for NBD_CMD_STATUS

2020-06-18 Thread Thadeu Lima de Souza Cascardo

The nest attribute NBD_ATTR_DEVICE_LIST was not accounted for when
allocating the message, resulting in -EMSGSIZE.

As __alloc_skb aligns size requests to SMP_CACHE_BYTES and SLUB will end up
allocating more than requested, this can hardly be reproduced on most
setups.

However, I managed to test this on a 32-bit x86 with 15 entries, by loading
with nbds_max=15. It failed with -EMSGSIZE, while it worked with 14 or 16
entries.

Signed-off-by: Thadeu Lima de Souza Cascardo 
---
 drivers/block/nbd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 43cff01a5a67..19551d8ca355 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -2265,6 +2265,7 @@ static int nbd_genl_status(struct sk_buff *skb, struct 
genl_info *info)
msg_size = nla_total_size(nla_attr_size(sizeof(u32)) +
  nla_attr_size(sizeof(u8)));
msg_size *= (index == -1) ? nbd_total_devices : 1;
+   msg_size += nla_total_size(0); /* for NBD_ATTR_DEVICE_LIST */
 
reply = genlmsg_new(msg_size, GFP_KERNEL);
if (!reply)
-- 
2.25.1

[PATCH] nbd: allocate sufficient space for NBD_CMD_STATUS

2020-06-18 Thread Thadeu Lima de Souza Cascardo

The nest attribute NBD_ATTR_DEVICE_LIST was not accounted for when
allocating the message, resulting in -EMSGSIZE.

As __alloc_skb aligns size requests to SMP_CACHE_BYTES and SLUB will end up
allocating more than requested, this can hardly be reproduced on most
setups.

However, I managed to test this on a 32-bit x86 with 15 entries, by loading
with nbds_max=15. It failed with -EMSGSIZE, while it worked with 14 or 16
entries.

Signed-off-by: Thadeu Lima de Souza Cascardo 
---
 drivers/block/nbd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 43cff01a5a67..19551d8ca355 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -2265,6 +2265,7 @@ static int nbd_genl_status(struct sk_buff *skb, struct 
genl_info *info)
msg_size = nla_total_size(nla_attr_size(sizeof(u32)) +
  nla_attr_size(sizeof(u8)));
msg_size *= (index == -1) ? nbd_total_devices : 1;
+   msg_size += nla_total_size(0); /* for NBD_ATTR_DEVICE_LIST */
 
reply = genlmsg_new(msg_size, GFP_KERNEL);
if (!reply)
-- 
2.25.1

Re: [PATCH] x86/speculation/srbds: do not try to turn mitigation off when not supported

2020-06-15 Thread Thadeu Lima de Souza Cascardo

On Mon, Jun 15, 2020 at 10:28:58AM +0200, Borislav Petkov wrote:
> On Tue, Jun 09, 2020 at 02:43:13PM -0300, Thadeu Lima de Souza Cascardo wrote:
> > When SRBDS is mitigated by TSX OFF, update_srbds_msr will still read and
> 
> Are you talking about this case in srbds_select_mitigation():
> 
> if ((ia32_cap & ARCH_CAP_MDS_NO) && !boot_cpu_has(X86_FEATURE_RTM))
> srbds_mitigation = SRBDS_MITIGATION_TSX_OFF;
> 
> ?
> 

That's the case that it hits, correct.

> and you have a system which:
> 
>  * Check to see if this is one of the MDS_NO systems supporting
>  * TSX that are only exposed to SRBDS when TSX is enabled.
> 
> i.e., no SRBDS microcode for it and the fix is to disable TSX?

It was booted without the microcode update. There was microcode available, but
systems may be booted without it, thus causing warnings due to the MSR
read/write.

> 
> If so, I think the right fix should be to do:
> 
>   if (!boot_cpu_has(X86_FEATURE_SRBDS_CTRL))
>   return;
> 
> in update_srbds_msr() with a comment above it explaining why that check
> is being done.

That's exactly the fix in the patch I sent, right? Do you want me to resend
with a comment, then?

Thanks.
Cascardo.

> 
> Hmmm.
> 
> -- 
> Regards/Gruss,
> Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

[PATCH] x86/speculation/srbds: do not try to turn mitigation off when not supported

2020-06-09 Thread Thadeu Lima de Souza Cascardo

When SRBDS is mitigated by TSX OFF, update_srbds_msr will still read and
write to MSR_IA32_MCU_OPT_CTRL even when that is not supported by the
microcode.

Checking for X86_FEATURE_SRBDS_CTRL as a CPU feature available makes more
sense than checking for SRBDS_MITIGATION_UCODE_NEEDED as the found
"mitigation".

Signed-off-by: Thadeu Lima de Souza Cascardo 
Acked-by: John Johansen 
Acked-by: Steve Beattie 
Cc: sta...@vger.kernel.org
---
 arch/x86/kernel/cpu/bugs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index b6f887be440c..ee5bdca7fd30 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -432,7 +432,7 @@ void update_srbds_msr(void)
if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
return;
 
-   if (srbds_mitigation == SRBDS_MITIGATION_UCODE_NEEDED)
+   if (!boot_cpu_has(X86_FEATURE_SRBDS_CTRL))
return;
 
rdmsrl(MSR_IA32_MCU_OPT_CTRL, mcu_ctrl);
-- 
2.25.1

[PATCH v2] selftests/seccomp: use 90s as timeout

2020-06-01 Thread Thadeu Lima de Souza Cascardo

As seccomp_benchmark tries to calibrate how many samples will take more
than 5 seconds to execute, it may end up picking up a number of samples
that take 10 (but up to 12) seconds. As the calibration will take double
that time, it takes around 20 seconds. Then, it executes the whole thing
again, and then once more, with some added overhead. So, the thing might
take more than 40 seconds, which is too close to the 45s timeout.

That is very dependent on the system where it's executed, so may not be
observed always, but it has been observed on x86 VMs. Using a 90s timeout
seems safe enough.

Signed-off-by: Thadeu Lima de Souza Cascardo 
---
 tools/testing/selftests/seccomp/settings | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 tools/testing/selftests/seccomp/settings

diff --git a/tools/testing/selftests/seccomp/settings 
b/tools/testing/selftests/seccomp/settings
new file mode 100644
index ..ba4d85f74cd6
--- /dev/null
+++ b/tools/testing/selftests/seccomp/settings
@@ -0,0 +1 @@
+timeout=90
-- 
2.25.1

[PATCH] selftests/seccomp: use 90s as timeout

2020-06-01 Thread Thadeu Lima de Souza Cascardo

As seccomp_benchmark tries to calibrate how many samples will take more
than 5 seconds to execute, it may end up picking up a number of samples
that take 10 (but up to 12) seconds. As the calibration will take double
that time, it takes around 20 seconds. Then, it executes the whole thing
again, and then once more, with some added overhead. So, the thing might
take more than 40 seconds, which is too close to the 45s timeout.

That is very dependent on the system where it's executed, so may not be
observed always, but it has been observed on x86 VMs. Using a 90s timeout
seems safe enough.

Signed-off-by: Thadeu Lima de Souza Cascardo 
---
 tools/testing/selftests/seccomp/settings | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/seccomp/settings 
b/tools/testing/selftests/seccomp/settings
index d61f00d8cad3..ba4d85f74cd6 100644
--- a/tools/testing/selftests/seccomp/settings
+++ b/tools/testing/selftests/seccomp/settings
@@ -1 +1 @@
-90
+timeout=90
-- 
2.25.1

[tip: refs/heads/timers/urgent] alarmtimer: Use EOPNOTSUPP instead of ENOTSUPP

2019-09-05 Thread tip-bot2 for Thadeu Lima de Souza Cascardo

The following commit has been merged into the refs/heads/timers/urgent branch 
of tip:

Commit-ID: f18ddc13af981ce3c7b7f26925f099e7c6929aba
Gitweb:
https://git.kernel.org/tip/f18ddc13af981ce3c7b7f26925f099e7c6929aba
Author:Thadeu Lima de Souza Cascardo 
AuthorDate:Tue, 03 Sep 2019 14:18:02 -03:00
Committer: Thomas Gleixner 
CommitterDate: Thu, 05 Sep 2019 21:19:26 +02:00

alarmtimer: Use EOPNOTSUPP instead of ENOTSUPP

ENOTSUPP is not supposed to be returned to userspace. This was found on an
OpenPower machine, where the RTC does not support set_alarm.

On that system, a clock_nanosleep(CLOCK_REALTIME_ALARM, ...) results in
"524 Unknown error 524"

Replace it with EOPNOTSUPP which results in the expected "95 Operation not
supported" error.

Fixes: 1c6b39ad3f01 (alarmtimers: Return -ENOTSUPP if no RTC device is present)
Signed-off-by: Thadeu Lima de Souza Cascardo 
Signed-off-by: Thomas Gleixner 
Cc: sta...@vger.kernel.org
Link: https://lkml.kernel.org/r/20190903171802.28314-1-casca...@canonical.com

---
 kernel/time/alarmtimer.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index 57518ef..b7d75a9 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -672,7 +672,7 @@ static int alarm_timer_create(struct k_itimer *new_timer)
enum  alarmtimer_type type;
 
if (!alarmtimer_get_rtcdev())
-   return -ENOTSUPP;
+   return -EOPNOTSUPP;
 
if (!capable(CAP_WAKE_ALARM))
return -EPERM;
@@ -790,7 +790,7 @@ static int alarm_timer_nsleep(const clockid_t which_clock, 
int flags,
int ret = 0;
 
if (!alarmtimer_get_rtcdev())
-   return -ENOTSUPP;
+   return -EOPNOTSUPP;
 
if (flags & ~TIMER_ABSTIME)
return -EINVAL;

Re: [PATCH v3] sched/core: Fix uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic and code

2019-09-04 Thread Thadeu Lima de Souza Cascardo

On Wed, Sep 04, 2019 at 12:39:25PM +0200, Ingo Molnar wrote:
> 
> * Dietmar Eggemann  wrote:
> 
> > > -v3 attached. Build and minimally boot tested.
> > > 
> > > Thanks,
> > > 
> > >   Ingo
> > > 
> > 
> > This patch fixes the issue (almost).
> > 
> > LTP's sched_getattr01 passes again. But IMHO the prio 'chrt -p $$'
> > should be 0 instead of -65536.
> 
> Yeah, I forgot to zero-initialize the structure ...
> 
> Does the attached -v4 work any better for you?
> 
> Thanks,
> 
>   Ingo

Hi, Ingo.

Thanks for the patch. It works just fine for me, I have only one comment about
it, below.

Acked-by: Thadeu Lima de Souza Cascardo 
Tested-by: Thadeu Lima de Souza Cascardo 

I also tested LTP, chrt, and that 5.2 (before the breaking commit) and 5.3 with
your patch behave the same when the user size is larger than what the kernel
knows, the return is 0 and the struct size is set to the known value.

There is one odd behaviour now, which is whenever size is set between VER0 and
VER1, we will have a partial struct filled up, instead of getting E2BIG or
EINVAL.

> 
> ===>
> Date: Wed, 4 Sep 2019 09:55:32 +0200
> Subject: [PATCH] sched/core: Fix uclamp ABI bug, clean up and robustify 
> sched_read_attr() ABI logic and code
> 
> Thadeu Lima de Souza Cascardo reported that 'chrt' broke on recent kernels:
> 
>   $ chrt -p $$
>   chrt: failed to get pid 26306's policy: Argument list too long
> 
> and he has root-caused the bug to the following commit increasing sched_attr
> size and breaking sched_read_attr() into returning -EFBIG:
> 
>   a509a7cd7974 ("sched/uclamp: Extend sched_setattr() to support utilization 
> clamping")
> 
> The other, bigger bug is that the whole sched_getattr() and sched_read_attr()
> logic of checking non-zero bits in new ABI components is arguably broken,
> and pretty much any extension of the ABI will spuriously break the ABI.
> That's way too fragile.
> 
> Instead implement the perf syscall's extensible ABI instead, which we
> already implement on the sched_setattr() side:
> 
>  - if user-attributes have the same size as kernel attributes then the
>logic is unchanged.
> 
>  - if user-attributes are larger than the kernel knows about then simply
>skip the extra bits, but set attr->size to the (smaller) kernel size
>so that tooling can (in principle) handle older kernel as well.
> 
>  - if user-attributes are smaller than the kernel knows about then just
>copy whatever user-space can accept.
> 
> Also clean up the whole logic:
> 
>  - Simplify the code flow - there's no need for 'ret' for example.
> 
>  - Standardize on 'kattr/uattr' and 'ksize/usize' naming to make sure we
>always know which side we are dealing with.
> 
>  - Why is it called 'read' when what it does is to copy to user? This
>code is so far away from VFS read() semantics that the naming is
>actively confusing. Name it sched_attr_copy_to_user() instead, which
>mirrors other copy_to_user() functionality.
> 
>  - Move the attr->size assignment from the head of sched_getattr() to the
>sched_attr_copy_to_user() function. Nothing else within the kernel
>should care about the size of the structure.
> 
> With these fixes the sched_getattr() syscall now nicely supports an
> extensible ABI in both a forward and backward compatible fashion, and
> will also fix the chrt bug.
> 
> As an added bonus the bogus -EFBIG return is removed as well, which as
> Thadeu noted should have been -E2BIG to begin with.
> 
> Reported-by: Thadeu Lima de Souza Cascardo 
> Cc: Arnaldo Carvalho de Melo 
> Cc: Jiri Olsa 
> Cc: Linus Torvalds 
> Cc: Patrick Bellasi 
> Cc: Peter Zijlstra 
> Cc: Thomas Gleixner 
> Fixes: a509a7cd7974 ("sched/uclamp: Extend sched_setattr() to support 
> utilization clamping")
> Link: https://lkml.kernel.org/r/20190904075532.ga26...@gmail.com
> Signed-off-by: Ingo Molnar 
> ---
>  kernel/sched/core.c | 82 
> +++--
>  1 file changed, 42 insertions(+), 40 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 010d578118d6..3c64eb28dd70 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5105,39 +5105,44 @@ SYSCALL_DEFINE2(sched_getparam, pid_t, pid, struct 
> sched_param __user *, param)
>   return retval;
>  }
>  
> -static int sched_read_attr(struct sched_attr __user *uattr,
> -struct sched_attr *attr,
> -unsigned int usize)
> +/*
> + * Copy the kernel size attribute structure (which might be larger
> + * than wh

[PATCH] alarmtimer: use EOPNOTSUPP instead of ENOTSUPP

2019-09-03 Thread Thadeu Lima de Souza Cascardo

ENOTSUPP is not supposed to be returned to userspace. This was found on an
OpenPower machine, where the RTC does not support set_alarm.

On that system, before the patch, a clock_nanosleep(CLOCK_REALTIME_ALARM, ...)
would result in "524 Unknown error 524", while after the patch, we get
"95 Operation not supported".

Signed-off-by: Thadeu Lima de Souza Cascardo 
Fixes: 1c6b39ad3f01 (alarmtimers: Return -ENOTSUPP if no RTC device is present)
---
 kernel/time/alarmtimer.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index 57518efc3810..b7d75a9e8ccf 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -672,7 +672,7 @@ static int alarm_timer_create(struct k_itimer *new_timer)
enum  alarmtimer_type type;
 
if (!alarmtimer_get_rtcdev())
-   return -ENOTSUPP;
+   return -EOPNOTSUPP;
 
if (!capable(CAP_WAKE_ALARM))
return -EPERM;
@@ -790,7 +790,7 @@ static int alarm_timer_nsleep(const clockid_t which_clock, 
int flags,
int ret = 0;
 
if (!alarmtimer_get_rtcdev())
-   return -ENOTSUPP;
+   return -EOPNOTSUPP;
 
if (flags & ~TIMER_ABSTIME)
return -EINVAL;
-- 
2.20.1

[PATCH 1/2] sched: sched_getattr should return E2BIG, not EFBIG

2019-09-03 Thread Thadeu Lima de Souza Cascardo

As documented and the behavior before commit 22400674945c (sched: Simplify
return logic in sched_read_attr()), sched_getattr should return E2BIG instead
of EFBIG when there is not enough space to copy sched_attr.

Signed-off-by: Thadeu Lima de Souza Cascardo 
Fixes: 22400674945c (sched: Simplify return logic in sched_read_attr())
---
 kernel/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2477893dd069..0fd67281e656 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5129,7 +5129,7 @@ static int sched_read_attr(struct sched_attr __user 
*uattr,
 
for (; addr < end; addr++) {
if (*addr)
-   return -EFBIG;
+   return -E2BIG;
}
 
attr->size = usize;
-- 
2.20.1

[PATCH 2/2] sched: allow sched_getattr with old size

2019-09-03 Thread Thadeu Lima de Souza Cascardo

After commit a509a7cd7974 (sched/uclamp: Extend sched_setattr() to support
utilization clamping), using sched_getattr with size 48 will return E2BIG.

This breaks, for example, chrt.
$ chrt -p $$
chrt: failed to get pid 26306's policy: Argument list too long
$

With this fix, when using the old size, the utilization clamping values will be
absent from sched_attr. When using the new size or some larger size, they will
be present.

After the fix, chrt works again.
$ chrt -p $$
pid 4649's current scheduling policy: SCHED_OTHER
pid 4649's current scheduling priority: 0
$

The drawback with this solution is that userspace will ignore there are
non-default utilization clamps, but it's arguable whether returning E2BIG in
this case makes sense when that same userspace doesn't know about those values
anyway.

Signed-off-by: Thadeu Lima de Souza Cascardo 
Fixes: a509a7cd7974 (sched/uclamp: Extend sched_setattr() to support 
utilization clamping)
Cc: Patrick Bellasi 
---
 kernel/sched/core.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0fd67281e656..0ccc7fa80da6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5183,8 +5183,10 @@ SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct 
sched_attr __user *, uattr,
attr.sched_nice = task_nice(p);
 
 #ifdef CONFIG_UCLAMP_TASK
-   attr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value;
-   attr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value;
+   if (size >= SCHED_ATTR_SIZE_VER1) {
+   attr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value;
+   attr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value;
+   }
 #endif
 
rcu_read_unlock();
-- 
2.20.1

Re: [PATCH] selftests/ftrace: Select an existing function in kprobe_eventname test

2019-07-08 Thread Thadeu Lima de Souza Cascardo

On Fri, Mar 22, 2019 at 03:09:23PM -0400, Steven Rostedt wrote:
> From: Steven Rostedt (VMware) 
> 
> Running the ftrace selftests on the latest kernel caused the
> kprobe_eventname test to fail. It was due to the test that searches for
> a function with at "dot" in the name and adding a probe to that.
> Unfortunately, for this test, it picked:
> 
>  optimize_nops.isra.2.cold.4
> 
> Which happens to be marked as "__init", which means it no longer exists
> in the kernel! (kallsyms keeps those function names around for tracing
> purposes)
> 
> As only functions that still exist are in the
> available_filter_functions file, as they are removed when the functions
> are freed at boot or module exit, have the test search for a function
> with ".isra." in the name as well as being in the
> available_filter_functions (if the file exists).
> 

This fixes a similar problem for me.

Tested-by: Thadeu Lima de Souza Cascardo 

> Signed-off-by: Steven Rostedt (VMware) 
> ---
> diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_eventname.tc 
> b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_eventname.tc
> index 3fb70e01b1fe..3ff236719b6e 100644
> --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_eventname.tc
> +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_eventname.tc
> @@ -24,7 +24,21 @@ test -d events/kprobes2/event2 || exit_failure
>  
>  :;: "Add an event on dot function without name" ;:
>  
> -FUNC=`grep -m 10 " [tT] .*\.isra\..*$" /proc/kallsyms | tail -n 1 | cut -f 3 
> -d " "`
> +find_dot_func() {
> + if [ ! -f available_filter_functions ]; then
> + grep -m 10 " [tT] .*\.isra\..*$" /proc/kallsyms | tail -n 1 | 
> cut -f 3 -d " "
> + return;
> + fi
> +
> + grep " [tT] .*\.isra\..*" /proc/kallsyms | cut -f 3 -d " " | while read 
> f; do
> + if grep -s $f available_filter_functions; then
> + echo $f
> + break
> + fi
> + done
> +}
> +
> +FUNC=`find_dot_func | tail -n 1`
>  [ "x" != "x$FUNC" ] || exit_unresolved
>  echo "p $FUNC" > kprobe_events
>  EVENT=`grep $FUNC kprobe_events | cut -f 1 -d " " | cut -f 2 -d:`

Re: [PATCH] selftests/ftrace: avoid failure when trying to probe a notrace function

2019-07-08 Thread Thadeu Lima de Souza Cascardo

On Mon, Jul 08, 2019 at 02:56:18PM -0400, Steven Rostedt wrote:
> On Mon,  8 Jul 2019 15:19:33 -0300
> Thadeu Lima de Souza Cascardo  wrote:
> 
> > Check that the function is on available_filter_functions. If it's not,
> > mark the test as unresolved, instead of failing it.
> > 
> 
> Actually, I sent this out a while ago:
> 
>   http://lkml.kernel.org/r/20190322150923.1b58e...@gandalf.local.home
> 
> Does that fix it for you?
> 
> -- Steve

Yes, that fix it for me, let me reply to the original message.

Thanks!
Cascardo.

> 
> 
> > Signed-off-by: Thadeu Lima de Souza Cascardo 
> > ---
> >  tools/testing/selftests/ftrace/test.d/kprobe/kprobe_eventname.tc | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git 
> > a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_eventname.tc 
> > b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_eventname.tc
> > index 3fb70e01b1fe..e4dff034da12 100644
> > --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_eventname.tc
> > +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_eventname.tc
> > @@ -26,6 +26,7 @@ test -d events/kprobes2/event2 || exit_failure
> >  
> >  FUNC=`grep -m 10 " [tT] .*\.isra\..*$" /proc/kallsyms | tail -n 1 | cut -f 
> > 3 -d " "`
> >  [ "x" != "x$FUNC" ] || exit_unresolved
> > +grep -n "$FUNC" available_filter_functions || exit_unresolved
> >  echo "p $FUNC" > kprobe_events
> >  EVENT=`grep $FUNC kprobe_events | cut -f 1 -d " " | cut -f 2 -d:`
> >  [ "x" != "x$EVENT" ] || exit_failure
>

[PATCH] selftests/ftrace: avoid failure when trying to probe a notrace function

2019-07-08 Thread Thadeu Lima de Souza Cascardo

Check that the function is on available_filter_functions. If it's not,
mark the test as unresolved, instead of failing it.

Signed-off-by: Thadeu Lima de Souza Cascardo 
---
 tools/testing/selftests/ftrace/test.d/kprobe/kprobe_eventname.tc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_eventname.tc 
b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_eventname.tc
index 3fb70e01b1fe..e4dff034da12 100644
--- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_eventname.tc
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_eventname.tc
@@ -26,6 +26,7 @@ test -d events/kprobes2/event2 || exit_failure
 
 FUNC=`grep -m 10 " [tT] .*\.isra\..*$" /proc/kallsyms | tail -n 1 | cut -f 3 
-d " "`
 [ "x" != "x$FUNC" ] || exit_unresolved
+grep -n "$FUNC" available_filter_functions || exit_unresolved
 echo "p $FUNC" > kprobe_events
 EVENT=`grep $FUNC kprobe_events | cut -f 1 -d " " | cut -f 2 -d:`
 [ "x" != "x$EVENT" ] || exit_failure
-- 
2.20.1

[tip:perf/urgent] perf annotate: Fix build on 32 bit for BPF annotation

2019-05-02 Thread tip-bot for Thadeu Lima de Souza Cascardo

Commit-ID:  01e985e900d3e602e9b1a55372a8e5274012a417
Gitweb: https://git.kernel.org/tip/01e985e900d3e602e9b1a55372a8e5274012a417
Author: Thadeu Lima de Souza Cascardo 
AuthorDate: Wed, 3 Apr 2019 16:44:52 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Thu, 2 May 2019 16:00:19 -0400

perf annotate: Fix build on 32 bit for BPF annotation

Commit 6987561c9e86 ("perf annotate: Enable annotation of BPF programs") adds
support for BPF programs annotations but the new code does not build on 32-bit.

Signed-off-by: Thadeu Lima de Souza Cascardo 
Acked-by: Song Liu 
Fixes: 6987561c9e86 ("perf annotate: Enable annotation of BPF programs")
Link: http://lkml.kernel.org/r/20190403194452.10845-1-casca...@canonical.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/annotate.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index c8b01176c9e1..09762985c713 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -1714,8 +1714,8 @@ static int symbol__disassemble_bpf(struct symbol *sym,
if (dso->binary_type != DSO_BINARY_TYPE__BPF_PROG_INFO)
return -1;
 
-   pr_debug("%s: handling sym %s addr %lx len %lx\n", __func__,
-sym->name, sym->start, sym->end - sym->start);
+   pr_debug("%s: handling sym %s addr %" PRIx64 " len %" PRIx64 "\n", 
__func__,
+ sym->name, sym->start, sym->end - sym->start);
 
memset(tpath, 0, sizeof(tpath));
perf_exe(tpath, sizeof(tpath));
@@ -1740,7 +1740,7 @@ static int symbol__disassemble_bpf(struct symbol *sym,
info_linear = info_node->info_linear;
sub_id = dso->bpf_prog.sub_id;
 
-   info.buffer = (void *)(info_linear->info.jited_prog_insns);
+   info.buffer = (void *)(uintptr_t)(info_linear->info.jited_prog_insns);
info.buffer_length = info_linear->info.jited_prog_len;
 
if (info_linear->info.nr_line_info)
@@ -1776,7 +1776,7 @@ static int symbol__disassemble_bpf(struct symbol *sym,
const char *srcline;
u64 addr;
 
-   addr = pc + ((u64 *)(info_linear->info.jited_ksyms))[sub_id];
+   addr = pc + ((u64 
*)(uintptr_t)(info_linear->info.jited_ksyms))[sub_id];
count = disassemble(pc, );
 
if (prog_linfo)

PING: Re: [PATCH] perf annotate: Fix build on 32 bit for BPF annotation

2019-04-23 Thread Thadeu Lima de Souza Cascardo

On Wed, Apr 03, 2019 at 09:34:02PM +, Song Liu wrote:
> 
> 
> > On Apr 3, 2019, at 12:44 PM, Thadeu Lima de Souza Cascardo 
> >  wrote:
> > 
> > Commit 6987561c9e86 ("perf annotate: Enable annotation of BPF programs") 
> > adds
> > support for BPF programs annotations but the new code does not build on 
> > 32-bit.
> > 
> > Fixes: 6987561c9e86 ("perf annotate: Enable annotation of BPF programs")
> > Signed-off-by: Thadeu Lima de Souza Cascardo 
> 
> Acked-by: Song Liu 
> 
> Thanks for the fix!
> Song
> 

Thanks for the ack.

Pinging for getting this applied, as this is a build failure on i386.

Thanks.
Cascardo.

> > ---
> > tools/perf/util/annotate.c | 8 
> > 1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> > index c8b01176c9e1..09762985c713 100644
> > --- a/tools/perf/util/annotate.c
> > +++ b/tools/perf/util/annotate.c
> > @@ -1714,8 +1714,8 @@ static int symbol__disassemble_bpf(struct symbol *sym,
> > if (dso->binary_type != DSO_BINARY_TYPE__BPF_PROG_INFO)
> > return -1;
> > 
> > -   pr_debug("%s: handling sym %s addr %lx len %lx\n", __func__,
> > -sym->name, sym->start, sym->end - sym->start);
> > +   pr_debug("%s: handling sym %s addr %" PRIx64 " len %" PRIx64 "\n", 
> > __func__,
> > + sym->name, sym->start, sym->end - sym->start);
> > 
> > memset(tpath, 0, sizeof(tpath));
> > perf_exe(tpath, sizeof(tpath));
> > @@ -1740,7 +1740,7 @@ static int symbol__disassemble_bpf(struct symbol *sym,
> > info_linear = info_node->info_linear;
> > sub_id = dso->bpf_prog.sub_id;
> > 
> > -   info.buffer = (void *)(info_linear->info.jited_prog_insns);
> > +   info.buffer = (void *)(uintptr_t)(info_linear->info.jited_prog_insns);
> > info.buffer_length = info_linear->info.jited_prog_len;
> > 
> > if (info_linear->info.nr_line_info)
> > @@ -1776,7 +1776,7 @@ static int symbol__disassemble_bpf(struct symbol *sym,
> > const char *srcline;
> > u64 addr;
> > 
> > -   addr = pc + ((u64 *)(info_linear->info.jited_ksyms))[sub_id];
> > +   addr = pc + ((u64 
> > *)(uintptr_t)(info_linear->info.jited_ksyms))[sub_id];
> > count = disassemble(pc, );
> > 
> > if (prog_linfo)
> > -- 
> > 2.20.1
> > 
>

[PATCH] perf annotate: Fix build on 32 bit for BPF annotation

2019-04-03 Thread Thadeu Lima de Souza Cascardo

Commit 6987561c9e86 ("perf annotate: Enable annotation of BPF programs") adds
support for BPF programs annotations but the new code does not build on 32-bit.

Fixes: 6987561c9e86 ("perf annotate: Enable annotation of BPF programs")
Signed-off-by: Thadeu Lima de Souza Cascardo 
---
 tools/perf/util/annotate.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index c8b01176c9e1..09762985c713 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -1714,8 +1714,8 @@ static int symbol__disassemble_bpf(struct symbol *sym,
if (dso->binary_type != DSO_BINARY_TYPE__BPF_PROG_INFO)
return -1;
 
-   pr_debug("%s: handling sym %s addr %lx len %lx\n", __func__,
-sym->name, sym->start, sym->end - sym->start);
+   pr_debug("%s: handling sym %s addr %" PRIx64 " len %" PRIx64 "\n", 
__func__,
+ sym->name, sym->start, sym->end - sym->start);
 
memset(tpath, 0, sizeof(tpath));
perf_exe(tpath, sizeof(tpath));
@@ -1740,7 +1740,7 @@ static int symbol__disassemble_bpf(struct symbol *sym,
info_linear = info_node->info_linear;
sub_id = dso->bpf_prog.sub_id;
 
-   info.buffer = (void *)(info_linear->info.jited_prog_insns);
+   info.buffer = (void *)(uintptr_t)(info_linear->info.jited_prog_insns);
info.buffer_length = info_linear->info.jited_prog_len;
 
if (info_linear->info.nr_line_info)
@@ -1776,7 +1776,7 @@ static int symbol__disassemble_bpf(struct symbol *sym,
const char *srcline;
u64 addr;
 
-   addr = pc + ((u64 *)(info_linear->info.jited_ksyms))[sub_id];
+   addr = pc + ((u64 
*)(uintptr_t)(info_linear->info.jited_ksyms))[sub_id];
count = disassemble(pc, );
 
if (prog_linfo)
-- 
2.20.1

Re: [PATCH] nvme: create 'paths' entries for hidden controllers

2018-11-01 Thread Thadeu Lima de Souza Cascardo

On Fri, Oct 05, 2018 at 09:32:45AM +0200, Christoph Hellwig wrote:
> On Fri, Sep 28, 2018 at 04:17:20PM -0300, Thadeu Lima de Souza Cascardo wrote:
> > When using initramfs-tools with only the necessary dependencies to mount
> > the root filesystem, it will fail to include nvme drivers for a root on a
> > multipath nvme. That happens because the slaves relationship is not
> > present.
> > 
> > As discussed in [1], using slaves will break lsblk, because the slaves are
> > hidden from userspace, that is, they have no real block device, just an
> > entry under sysfs.
> > 
> > Introducing the paths subdir and using that on initramfs-tools makes it
> > possible to now boot a system with nvme multipath as root.
> 
> Do we need documentation how these paths links are supposed to work?
> Who is going to parse them?

Hi, Christoph.

I have just sent a v2 against block/for-next with a Documentation file
describing it. The first intended user is initramfs-tools, documented there as
well.

Thanks.
Cascardo.

Re: [PATCH] nvme: create 'paths' entries for hidden controllers

2018-11-01 Thread Thadeu Lima de Souza Cascardo

On Fri, Oct 05, 2018 at 09:32:45AM +0200, Christoph Hellwig wrote:
> On Fri, Sep 28, 2018 at 04:17:20PM -0300, Thadeu Lima de Souza Cascardo wrote:
> > When using initramfs-tools with only the necessary dependencies to mount
> > the root filesystem, it will fail to include nvme drivers for a root on a
> > multipath nvme. That happens because the slaves relationship is not
> > present.
> > 
> > As discussed in [1], using slaves will break lsblk, because the slaves are
> > hidden from userspace, that is, they have no real block device, just an
> > entry under sysfs.
> > 
> > Introducing the paths subdir and using that on initramfs-tools makes it
> > possible to now boot a system with nvme multipath as root.
> 
> Do we need documentation how these paths links are supposed to work?
> Who is going to parse them?

Hi, Christoph.

I have just sent a v2 against block/for-next with a Documentation file
describing it. The first intended user is initramfs-tools, documented there as
well.

Thanks.
Cascardo.

[PATCH v2] nvme: create 'paths' entries for hidden controllers

2018-11-01 Thread Thadeu Lima de Souza Cascardo

When using initramfs-tools with only the necessary dependencies to mount
the root filesystem, it will fail to include nvme drivers for a root on a
multipath nvme. That happens because the slaves relationship is not
present.

As discussed in [1], using slaves will break lsblk, because the slaves are
hidden from userspace, that is, they have no real block device, just an
entry under sysfs.

Introducing the paths subdir and using that on initramfs-tools makes it
possible to now boot a system with nvme multipath as root.

[1] https://www.spinics.net/lists/stable/msg222779.html

Cc: Christoph Hellwig 
Cc: Potnuri Bharat Teja 
Cc: Keith Busch 
Cc: Hannes Reinecke 
Cc: Martin K. Petersen 
Signed-off-by: Thadeu Lima de Souza Cascardo 
---
 Documentation/ABI/testing/sysfs-block-nvme | 10 
 drivers/nvme/host/core.c   |  2 ++
 drivers/nvme/host/multipath.c  | 29 --
 drivers/nvme/host/nvme.h   |  9 +++
 4 files changed, 48 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-block-nvme

diff --git a/Documentation/ABI/testing/sysfs-block-nvme 
b/Documentation/ABI/testing/sysfs-block-nvme
new file mode 100644
index ..3fe51b7be1e1
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-block-nvme
@@ -0,0 +1,10 @@
+What:  /sys/block/nvme*/paths
+Date:  Oct, 2019
+KernelVersion: v4.21
+Contact:   Thadeu Lima de Souza Cascardo 
+Description:
+   This is a directory containing symlinks to other block
+   devices, when the block device is a nvme multipath
+   device.
+Users: initramfs-tools
+
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 9e4a30b05bd2..06be47e878f5 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3115,6 +3115,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, 
unsigned nsid)
device_add_disk(ctrl->device, ns->disk, nvme_ns_id_attr_groups);
 
nvme_mpath_add_disk(ns, id);
+   nvme_mpath_add_disk_links(ns);
nvme_fault_inject_init(ns);
kfree(id);
 
@@ -3138,6 +3139,7 @@ static void nvme_ns_remove(struct nvme_ns *ns)
 
nvme_fault_inject_fini(ns);
if (ns->disk && ns->disk->flags & GENHD_FL_UP) {
+   nvme_mpath_remove_disk_links(ns);
del_gendisk(ns->disk);
blk_cleanup_queue(ns->queue);
if (blk_get_integrity(ns->disk))
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 5e3cc8c59a39..65dabe7d6d7c 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -317,9 +317,12 @@ static void nvme_mpath_set_live(struct nvme_ns *ns)
if (!head->disk)
return;
 
-   if (!(head->disk->flags & GENHD_FL_UP))
+   if (!(head->disk->flags & GENHD_FL_UP)) {
+   struct kobject *hd_kobj = _to_dev(head->disk)->kobj;
device_add_disk(>subsys->dev, head->disk,
nvme_ns_id_attr_groups);
+   head->path_dir = kobject_create_and_add("paths", hd_kobj);
+   }
 
if (nvme_path_is_optimized(ns)) {
int node, srcu_idx;
@@ -530,6 +533,19 @@ void nvme_mpath_add_disk(struct nvme_ns *ns, struct 
nvme_id_ns *id)
}
 }
 
+void nvme_mpath_add_disk_links(struct nvme_ns *ns)
+{
+   struct kobject *path_disk_kobj;
+
+   if (!ns->head->disk)
+   return;
+
+   path_disk_kobj = _to_dev(ns->disk)->kobj;
+   if (sysfs_create_link(ns->head->path_dir, path_disk_kobj,
+   kobject_name(path_disk_kobj)))
+   return;
+}
+
 void nvme_mpath_remove_disk(struct nvme_ns_head *head)
 {
if (!head->disk)
@@ -541,9 +557,19 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head)
kblockd_schedule_work(>requeue_work);
flush_work(>requeue_work);
blk_cleanup_queue(head->disk->queue);
+   kobject_put(head->path_dir);
put_disk(head->disk);
 }
 
+void nvme_mpath_remove_disk_links(struct nvme_ns *ns)
+{
+   if (!ns->head->disk)
+   return;
+
+   sysfs_remove_link(ns->head->path_dir,
+   kobject_name(_to_dev(ns->disk)->kobj));
+}
+
 int nvme_mpath_init(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
 {
int error;
@@ -593,4 +619,3 @@ void nvme_mpath_uninit(struct nvme_ctrl *ctrl)
 {
kfree(ctrl->ana_log_buf);
 }
-
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 9fefba039d1e..6093649d4696 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -287,6 +287,7 @@ struct nvme_ns_head {
int instance;
 #ifdef CONFIG_NVME_MULTIPATH
struct gendisk  *disk;
+   struct kobject

[PATCH v2] nvme: create 'paths' entries for hidden controllers

2018-11-01 Thread Thadeu Lima de Souza Cascardo

When using initramfs-tools with only the necessary dependencies to mount
the root filesystem, it will fail to include nvme drivers for a root on a
multipath nvme. That happens because the slaves relationship is not
present.

As discussed in [1], using slaves will break lsblk, because the slaves are
hidden from userspace, that is, they have no real block device, just an
entry under sysfs.

Introducing the paths subdir and using that on initramfs-tools makes it
possible to now boot a system with nvme multipath as root.

[1] https://www.spinics.net/lists/stable/msg222779.html

Cc: Christoph Hellwig 
Cc: Potnuri Bharat Teja 
Cc: Keith Busch 
Cc: Hannes Reinecke 
Cc: Martin K. Petersen 
Signed-off-by: Thadeu Lima de Souza Cascardo 
---
 Documentation/ABI/testing/sysfs-block-nvme | 10 
 drivers/nvme/host/core.c   |  2 ++
 drivers/nvme/host/multipath.c  | 29 --
 drivers/nvme/host/nvme.h   |  9 +++
 4 files changed, 48 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-block-nvme

diff --git a/Documentation/ABI/testing/sysfs-block-nvme 
b/Documentation/ABI/testing/sysfs-block-nvme
new file mode 100644
index ..3fe51b7be1e1
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-block-nvme
@@ -0,0 +1,10 @@
+What:  /sys/block/nvme*/paths
+Date:  Oct, 2019
+KernelVersion: v4.21
+Contact:   Thadeu Lima de Souza Cascardo 
+Description:
+   This is a directory containing symlinks to other block
+   devices, when the block device is a nvme multipath
+   device.
+Users: initramfs-tools
+
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 9e4a30b05bd2..06be47e878f5 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3115,6 +3115,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, 
unsigned nsid)
device_add_disk(ctrl->device, ns->disk, nvme_ns_id_attr_groups);
 
nvme_mpath_add_disk(ns, id);
+   nvme_mpath_add_disk_links(ns);
nvme_fault_inject_init(ns);
kfree(id);
 
@@ -3138,6 +3139,7 @@ static void nvme_ns_remove(struct nvme_ns *ns)
 
nvme_fault_inject_fini(ns);
if (ns->disk && ns->disk->flags & GENHD_FL_UP) {
+   nvme_mpath_remove_disk_links(ns);
del_gendisk(ns->disk);
blk_cleanup_queue(ns->queue);
if (blk_get_integrity(ns->disk))
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 5e3cc8c59a39..65dabe7d6d7c 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -317,9 +317,12 @@ static void nvme_mpath_set_live(struct nvme_ns *ns)
if (!head->disk)
return;
 
-   if (!(head->disk->flags & GENHD_FL_UP))
+   if (!(head->disk->flags & GENHD_FL_UP)) {
+   struct kobject *hd_kobj = _to_dev(head->disk)->kobj;
device_add_disk(>subsys->dev, head->disk,
nvme_ns_id_attr_groups);
+   head->path_dir = kobject_create_and_add("paths", hd_kobj);
+   }
 
if (nvme_path_is_optimized(ns)) {
int node, srcu_idx;
@@ -530,6 +533,19 @@ void nvme_mpath_add_disk(struct nvme_ns *ns, struct 
nvme_id_ns *id)
}
 }
 
+void nvme_mpath_add_disk_links(struct nvme_ns *ns)
+{
+   struct kobject *path_disk_kobj;
+
+   if (!ns->head->disk)
+   return;
+
+   path_disk_kobj = _to_dev(ns->disk)->kobj;
+   if (sysfs_create_link(ns->head->path_dir, path_disk_kobj,
+   kobject_name(path_disk_kobj)))
+   return;
+}
+
 void nvme_mpath_remove_disk(struct nvme_ns_head *head)
 {
if (!head->disk)
@@ -541,9 +557,19 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head)
kblockd_schedule_work(>requeue_work);
flush_work(>requeue_work);
blk_cleanup_queue(head->disk->queue);
+   kobject_put(head->path_dir);
put_disk(head->disk);
 }
 
+void nvme_mpath_remove_disk_links(struct nvme_ns *ns)
+{
+   if (!ns->head->disk)
+   return;
+
+   sysfs_remove_link(ns->head->path_dir,
+   kobject_name(_to_dev(ns->disk)->kobj));
+}
+
 int nvme_mpath_init(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
 {
int error;
@@ -593,4 +619,3 @@ void nvme_mpath_uninit(struct nvme_ctrl *ctrl)
 {
kfree(ctrl->ana_log_buf);
 }
-
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 9fefba039d1e..6093649d4696 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -287,6 +287,7 @@ struct nvme_ns_head {
int instance;
 #ifdef CONFIG_NVME_MULTIPATH
struct gendisk  *disk;
+   struct kobject

[PATCH] nvme: create 'paths' entries for hidden controllers

2018-09-28 Thread Thadeu Lima de Souza Cascardo

When using initramfs-tools with only the necessary dependencies to mount
the root filesystem, it will fail to include nvme drivers for a root on a
multipath nvme. That happens because the slaves relationship is not
present.

As discussed in [1], using slaves will break lsblk, because the slaves are
hidden from userspace, that is, they have no real block device, just an
entry under sysfs.

Introducing the paths subdir and using that on initramfs-tools makes it
possible to now boot a system with nvme multipath as root.

[1] https://www.spinics.net/lists/stable/msg222779.html

Cc: Christoph Hellwig 
Cc: Potnuri Bharat Teja 
Cc: Keith Busch 
Cc: Hannes Reinecke 
Cc: Martin K. Petersen 
Signed-off-by: Thadeu Lima de Souza Cascardo 
---
 drivers/nvme/host/core.c  |  2 ++
 drivers/nvme/host/multipath.c | 28 ++--
 drivers/nvme/host/nvme.h  |  9 +
 3 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 456d37a02ea3..6958e8cab92c 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3056,6 +3056,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, 
unsigned nsid)
ns->disk->disk_name);
 
nvme_mpath_add_disk(ns->head);
+   nvme_mpath_add_disk_links(ns);
nvme_fault_inject_init(ns);
return;
  out_unlink_ns:
@@ -3077,6 +3078,7 @@ static void nvme_ns_remove(struct nvme_ns *ns)
 
nvme_fault_inject_fini(ns);
if (ns->disk && ns->disk->flags & GENHD_FL_UP) {
+   nvme_mpath_remove_disk_links(ns);
sysfs_remove_group(_to_dev(ns->disk)->kobj,
_ns_id_attr_group);
if (ns->ndev)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 348aa405b641..70a4a0611550 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -220,15 +220,29 @@ void nvme_mpath_add_disk(struct nvme_ns_head *head)
 
mutex_lock(>subsys->lock);
if (!(head->disk->flags & GENHD_FL_UP)) {
+   struct kobject *hd_kobj = _to_dev(head->disk)->kobj;
device_add_disk(>subsys->dev, head->disk);
-   if (sysfs_create_group(_to_dev(head->disk)->kobj,
-   _ns_id_attr_group))
+   head->path_dir = kobject_create_and_add("paths", hd_kobj);
+   if (sysfs_create_group(hd_kobj, _ns_id_attr_group))
pr_warn("%s: failed to create sysfs group for 
identification\n",
head->disk->disk_name);
}
mutex_unlock(>subsys->lock);
 }
 
+void nvme_mpath_add_disk_links(struct nvme_ns *ns)
+{
+   struct kobject *path_disk_kobj;
+
+   if (!ns->head->disk)
+   return;
+
+   path_disk_kobj = _to_dev(ns->disk)->kobj;
+   if (sysfs_create_link(ns->head->path_dir, path_disk_kobj,
+   kobject_name(path_disk_kobj)))
+   return;
+}
+
 void nvme_mpath_remove_disk(struct nvme_ns_head *head)
 {
if (!head->disk)
@@ -241,5 +255,15 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head)
kblockd_schedule_work(>requeue_work);
flush_work(>requeue_work);
blk_cleanup_queue(head->disk->queue);
+   kobject_put(head->path_dir);
put_disk(head->disk);
 }
+
+void nvme_mpath_remove_disk_links(struct nvme_ns *ns)
+{
+   if (!ns->head->disk)
+   return;
+
+   sysfs_remove_link(ns->head->path_dir,
+   kobject_name(_to_dev(ns->disk)->kobj));
+}
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 07452adef110..8371f98b657e 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -265,6 +265,7 @@ struct nvme_ns_ids {
 struct nvme_ns_head {
 #ifdef CONFIG_NVME_MULTIPATH
struct gendisk  *disk;
+   struct kobject  *path_dir;
struct nvme_ns __rcu*current_path;
struct bio_list requeue_list;
spinlock_t  requeue_lock;
@@ -456,7 +457,9 @@ void nvme_failover_req(struct request *req);
 void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl);
 int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl,struct nvme_ns_head *head);
 void nvme_mpath_add_disk(struct nvme_ns_head *head);
+void nvme_mpath_add_disk_links(struct nvme_ns *ns);
 void nvme_mpath_remove_disk(struct nvme_ns_head *head);
+void nvme_mpath_remove_disk_links(struct nvme_ns *ns);
 
 static inline void nvme_mpath_clear_current_path(struct nvme_ns *ns)
 {
@@ -503,6 +506,12 @@ static inline void nvme_mpath_add_disk(struct nvme_ns_head 
*head)
 static inline void nvme_mpath_remove_disk(struct nvme_ns_head *head)
 {
 }
+static inline void nvme_mpath_add_disk_links(s

[PATCH] nvme: create 'paths' entries for hidden controllers

2018-09-28 Thread Thadeu Lima de Souza Cascardo

When using initramfs-tools with only the necessary dependencies to mount
the root filesystem, it will fail to include nvme drivers for a root on a
multipath nvme. That happens because the slaves relationship is not
present.

As discussed in [1], using slaves will break lsblk, because the slaves are
hidden from userspace, that is, they have no real block device, just an
entry under sysfs.

Introducing the paths subdir and using that on initramfs-tools makes it
possible to now boot a system with nvme multipath as root.

[1] https://www.spinics.net/lists/stable/msg222779.html

Cc: Christoph Hellwig 
Cc: Potnuri Bharat Teja 
Cc: Keith Busch 
Cc: Hannes Reinecke 
Cc: Martin K. Petersen 
Signed-off-by: Thadeu Lima de Souza Cascardo 
---
 drivers/nvme/host/core.c  |  2 ++
 drivers/nvme/host/multipath.c | 28 ++--
 drivers/nvme/host/nvme.h  |  9 +
 3 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 456d37a02ea3..6958e8cab92c 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3056,6 +3056,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, 
unsigned nsid)
ns->disk->disk_name);
 
nvme_mpath_add_disk(ns->head);
+   nvme_mpath_add_disk_links(ns);
nvme_fault_inject_init(ns);
return;
  out_unlink_ns:
@@ -3077,6 +3078,7 @@ static void nvme_ns_remove(struct nvme_ns *ns)
 
nvme_fault_inject_fini(ns);
if (ns->disk && ns->disk->flags & GENHD_FL_UP) {
+   nvme_mpath_remove_disk_links(ns);
sysfs_remove_group(_to_dev(ns->disk)->kobj,
_ns_id_attr_group);
if (ns->ndev)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 348aa405b641..70a4a0611550 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -220,15 +220,29 @@ void nvme_mpath_add_disk(struct nvme_ns_head *head)
 
mutex_lock(>subsys->lock);
if (!(head->disk->flags & GENHD_FL_UP)) {
+   struct kobject *hd_kobj = _to_dev(head->disk)->kobj;
device_add_disk(>subsys->dev, head->disk);
-   if (sysfs_create_group(_to_dev(head->disk)->kobj,
-   _ns_id_attr_group))
+   head->path_dir = kobject_create_and_add("paths", hd_kobj);
+   if (sysfs_create_group(hd_kobj, _ns_id_attr_group))
pr_warn("%s: failed to create sysfs group for 
identification\n",
head->disk->disk_name);
}
mutex_unlock(>subsys->lock);
 }
 
+void nvme_mpath_add_disk_links(struct nvme_ns *ns)
+{
+   struct kobject *path_disk_kobj;
+
+   if (!ns->head->disk)
+   return;
+
+   path_disk_kobj = _to_dev(ns->disk)->kobj;
+   if (sysfs_create_link(ns->head->path_dir, path_disk_kobj,
+   kobject_name(path_disk_kobj)))
+   return;
+}
+
 void nvme_mpath_remove_disk(struct nvme_ns_head *head)
 {
if (!head->disk)
@@ -241,5 +255,15 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head)
kblockd_schedule_work(>requeue_work);
flush_work(>requeue_work);
blk_cleanup_queue(head->disk->queue);
+   kobject_put(head->path_dir);
put_disk(head->disk);
 }
+
+void nvme_mpath_remove_disk_links(struct nvme_ns *ns)
+{
+   if (!ns->head->disk)
+   return;
+
+   sysfs_remove_link(ns->head->path_dir,
+   kobject_name(_to_dev(ns->disk)->kobj));
+}
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 07452adef110..8371f98b657e 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -265,6 +265,7 @@ struct nvme_ns_ids {
 struct nvme_ns_head {
 #ifdef CONFIG_NVME_MULTIPATH
struct gendisk  *disk;
+   struct kobject  *path_dir;
struct nvme_ns __rcu*current_path;
struct bio_list requeue_list;
spinlock_t  requeue_lock;
@@ -456,7 +457,9 @@ void nvme_failover_req(struct request *req);
 void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl);
 int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl,struct nvme_ns_head *head);
 void nvme_mpath_add_disk(struct nvme_ns_head *head);
+void nvme_mpath_add_disk_links(struct nvme_ns *ns);
 void nvme_mpath_remove_disk(struct nvme_ns_head *head);
+void nvme_mpath_remove_disk_links(struct nvme_ns *ns);
 
 static inline void nvme_mpath_clear_current_path(struct nvme_ns *ns)
 {
@@ -503,6 +506,12 @@ static inline void nvme_mpath_add_disk(struct nvme_ns_head 
*head)
 static inline void nvme_mpath_remove_disk(struct nvme_ns_head *head)
 {
 }
+static inline void nvme_mpath_add_disk_links(s

Re: [PATCH] ftrace: use non-archaic spelling of failes

2018-05-31 Thread Thadeu Lima de Souza Cascardo

On Thu, May 31, 2018 at 08:15:45AM -0700, Joe Perches wrote:
> On Thu, 2018-05-31 at 09:17 -0300, Thadeu Lima de Souza Cascardo wrote:
> > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> []
> > @@ -4788,7 +4788,7 @@ ftrace_set_addr(struct ftrace_ops *ops, unsigned long 
> > ip, int remove,
> >   * @reset - non zero to reset all filters before applying this filter.
> >   *
> >   * Filters denote which functions should be enabled when tracing is enabled
> > - * If @ip is NULL, it failes to update filter.
> > + * If @ip is NULL, it fails to update filter.
> >   */
> >  int ftrace_set_filter_ip(struct ftrace_ops *ops, unsigned long ip,
> >  int remove, int reset)
> 
> Perhaps fix it treewide?
> 
> $ git grep -w -i failes | wc -l
> 8
> 

I sent patches to the respective maintainers for all of these.

Cascardo.

Re: [PATCH] ftrace: use non-archaic spelling of failes

2018-05-31 Thread Thadeu Lima de Souza Cascardo

On Thu, May 31, 2018 at 08:15:45AM -0700, Joe Perches wrote:
> On Thu, 2018-05-31 at 09:17 -0300, Thadeu Lima de Souza Cascardo wrote:
> > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> []
> > @@ -4788,7 +4788,7 @@ ftrace_set_addr(struct ftrace_ops *ops, unsigned long 
> > ip, int remove,
> >   * @reset - non zero to reset all filters before applying this filter.
> >   *
> >   * Filters denote which functions should be enabled when tracing is enabled
> > - * If @ip is NULL, it failes to update filter.
> > + * If @ip is NULL, it fails to update filter.
> >   */
> >  int ftrace_set_filter_ip(struct ftrace_ops *ops, unsigned long ip,
> >  int remove, int reset)
> 
> Perhaps fix it treewide?
> 
> $ git grep -w -i failes | wc -l
> 8
> 

I sent patches to the respective maintainers for all of these.

Cascardo.

[PATCH] ftrace: use non-archaic spelling of failes

2018-05-31 Thread Thadeu Lima de Souza Cascardo

Signed-off-by: Thadeu Lima de Souza Cascardo 
---
 kernel/trace/ftrace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 8d83bcf9ef69..b218cee1f18c 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -4788,7 +4788,7 @@ ftrace_set_addr(struct ftrace_ops *ops, unsigned long ip, 
int remove,
  * @reset - non zero to reset all filters before applying this filter.
  *
  * Filters denote which functions should be enabled when tracing is enabled
- * If @ip is NULL, it failes to update filter.
+ * If @ip is NULL, it fails to update filter.
  */
 int ftrace_set_filter_ip(struct ftrace_ops *ops, unsigned long ip,
 int remove, int reset)
-- 
2.17.0

[PATCH] ftrace: use non-archaic spelling of failes

2018-05-31 Thread Thadeu Lima de Souza Cascardo

Signed-off-by: Thadeu Lima de Souza Cascardo 
---
 kernel/trace/ftrace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 8d83bcf9ef69..b218cee1f18c 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -4788,7 +4788,7 @@ ftrace_set_addr(struct ftrace_ops *ops, unsigned long ip, 
int remove,
  * @reset - non zero to reset all filters before applying this filter.
  *
  * Filters denote which functions should be enabled when tracing is enabled
- * If @ip is NULL, it failes to update filter.
+ * If @ip is NULL, it fails to update filter.
  */
 int ftrace_set_filter_ip(struct ftrace_ops *ops, unsigned long ip,
 int remove, int reset)
-- 
2.17.0

Re: [PATCH] fs/binfmt_misc.c: do not allow offset overflow

2018-05-29 Thread Thadeu Lima de Souza Cascardo

On Tue, May 29, 2018 at 03:08:54PM -0700, Andrew Morton wrote:
> On Tue, 29 May 2018 10:56:48 -0300 Thadeu Lima de Souza Cascardo 
>  wrote:
> 
> > It's possible to overflow the offset to get a negative value, which might
> > crash the system, or possibly leak kernel data.
> 
> I think the missing information here is "when registering a new
> binfmt_misc binary type", yes?
> 

Yes, when registering a new type.

[...]
> > Cc: sta...@vger.kernel.org
> 
> Registering a handler is a priveleged operation.  As such, I don't
> think a -stable backport is needed?
> 

Not when we take containers in mind. We might question the permission to mount
a binfmt_misc inside a container, that may already have left open other ways of
exploiting the system. But I would rather see this closed on my stable systems.

Cascardo.

Re: [PATCH] fs/binfmt_misc.c: do not allow offset overflow

2018-05-29 Thread Thadeu Lima de Souza Cascardo

On Tue, May 29, 2018 at 03:08:54PM -0700, Andrew Morton wrote:
> On Tue, 29 May 2018 10:56:48 -0300 Thadeu Lima de Souza Cascardo 
>  wrote:
> 
> > It's possible to overflow the offset to get a negative value, which might
> > crash the system, or possibly leak kernel data.
> 
> I think the missing information here is "when registering a new
> binfmt_misc binary type", yes?
> 

Yes, when registering a new type.

[...]
> > Cc: sta...@vger.kernel.org
> 
> Registering a handler is a priveleged operation.  As such, I don't
> think a -stable backport is needed?
> 

Not when we take containers in mind. We might question the permission to mount
a binfmt_misc inside a container, that may already have left open other ways of
exploiting the system. But I would rather see this closed on my stable systems.

Cascardo.

[PATCH] fs/binfmt_misc.c: do not allow offset overflow

2018-05-29 Thread Thadeu Lima de Souza Cascardo

It's possible to overflow the offset to get a negative value, which might
crash the system, or possibly leak kernel data.

Here is a crash log when using 25 as offset:

[ 6050.251552] BUG: unable to handle kernel paging request at 989cfd6edca0
[ 6050.252053] IP: load_misc_binary+0x22b/0x470 [binfmt_misc]
[ 6050.252053] PGD 1ef3e067 P4D 1ef3e067 PUD 0
[ 6050.252053] Oops:  [#1] SMP NOPTI
[ 6050.252053] Modules linked in: binfmt_misc kvm_intel ppdev kvm irqbypass 
joydev input_leds serio_raw mac_hid parport_pc qemu_fw_cfg parpy
[ 6050.252053] CPU: 0 PID: 2499 Comm: bash Not tainted 4.15.0-22-generic 
#24-Ubuntu
[ 6050.252053] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.11.1-1 04/01/2014
[ 6050.252053] RIP: 0010:load_misc_binary+0x22b/0x470 [binfmt_misc]
[ 6050.252053] RSP: 0018:b6e383017e18 EFLAGS: 00010202
[ 6050.252053] RAX: 0003 RBX: 989d74a47100 RCX: 989cfd6edca0
[ 6050.252053] RDX:  RSI:  RDI: 989d7d2e95e5
[ 6050.252053] RBP: b6e383017e48 R08: 0001 R09: 
[ 6050.252053] R10:  R11: fefefefefefefeff R12: 0001
[ 6050.252053] R13: 989d7d2e9580 R14:  R15: c0592160
[ 6050.252053] FS:  7fa424c89740() GS:989d7fc0() 
knlGS:
[ 6050.252053] CS:  0010 DS:  ES:  CR0: 80050033
[ 6050.252053] CR2: 989cfd6edca0 CR3: 3db08000 CR4: 06f0
[ 6050.252053] Call Trace:
[ 6050.252053]  search_binary_handler+0x97/0x1d0
[ 6050.252053]  do_execveat_common.isra.34+0x667/0x810
[ 6050.252053]  SyS_execve+0x31/0x40
[ 6050.252053]  do_syscall_64+0x73/0x130
[ 6050.252053]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

Use kstrtoint instead of simple_strtoul. It will work as the code already
set the delimiter byte to '\0' and we only do it when the field is not
empty.

Tested with offsets -1, 25, UINT_MAX and INT_MAX. Also tested with
examples documented at Documentation/admin-guide/binfmt-misc.rst and other
registrations from packages on Ubuntu.

Signed-off-by: Thadeu Lima de Souza Cascardo 
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: sta...@vger.kernel.org
---
 fs/binfmt_misc.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index a41b48f82a70..4de191563261 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -387,8 +387,13 @@ static Node *create_entry(const char __user *buffer, 
size_t count)
s = strchr(p, del);
if (!s)
goto einval;
-   *s++ = '\0';
-   e->offset = simple_strtoul(p, , 10);
+   *s = '\0';
+   if (p != s) {
+   int r = kstrtoint(p, 10, >offset);
+   if (r != 0 || e->offset < 0)
+   goto einval;
+   }
+   p = s;
if (*p++)
goto einval;
pr_debug("register: offset: %#x\n", e->offset);
@@ -428,7 +433,8 @@ static Node *create_entry(const char __user *buffer, size_t 
count)
if (e->mask &&
string_unescape_inplace(e->mask, UNESCAPE_HEX) != e->size)
goto einval;
-   if (e->size + e->offset > BINPRM_BUF_SIZE)
+   if (e->size > BINPRM_BUF_SIZE ||
+   BINPRM_BUF_SIZE - e->size < e->offset)
goto einval;
pr_debug("register: magic/mask length: %i\n", e->size);
if (USE_DEBUG) {
-- 
2.17.0

[PATCH] fs/binfmt_misc.c: do not allow offset overflow

2018-05-29 Thread Thadeu Lima de Souza Cascardo

It's possible to overflow the offset to get a negative value, which might
crash the system, or possibly leak kernel data.

Here is a crash log when using 25 as offset:

[ 6050.251552] BUG: unable to handle kernel paging request at 989cfd6edca0
[ 6050.252053] IP: load_misc_binary+0x22b/0x470 [binfmt_misc]
[ 6050.252053] PGD 1ef3e067 P4D 1ef3e067 PUD 0
[ 6050.252053] Oops:  [#1] SMP NOPTI
[ 6050.252053] Modules linked in: binfmt_misc kvm_intel ppdev kvm irqbypass 
joydev input_leds serio_raw mac_hid parport_pc qemu_fw_cfg parpy
[ 6050.252053] CPU: 0 PID: 2499 Comm: bash Not tainted 4.15.0-22-generic 
#24-Ubuntu
[ 6050.252053] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.11.1-1 04/01/2014
[ 6050.252053] RIP: 0010:load_misc_binary+0x22b/0x470 [binfmt_misc]
[ 6050.252053] RSP: 0018:b6e383017e18 EFLAGS: 00010202
[ 6050.252053] RAX: 0003 RBX: 989d74a47100 RCX: 989cfd6edca0
[ 6050.252053] RDX:  RSI:  RDI: 989d7d2e95e5
[ 6050.252053] RBP: b6e383017e48 R08: 0001 R09: 
[ 6050.252053] R10:  R11: fefefefefefefeff R12: 0001
[ 6050.252053] R13: 989d7d2e9580 R14:  R15: c0592160
[ 6050.252053] FS:  7fa424c89740() GS:989d7fc0() 
knlGS:
[ 6050.252053] CS:  0010 DS:  ES:  CR0: 80050033
[ 6050.252053] CR2: 989cfd6edca0 CR3: 3db08000 CR4: 06f0
[ 6050.252053] Call Trace:
[ 6050.252053]  search_binary_handler+0x97/0x1d0
[ 6050.252053]  do_execveat_common.isra.34+0x667/0x810
[ 6050.252053]  SyS_execve+0x31/0x40
[ 6050.252053]  do_syscall_64+0x73/0x130
[ 6050.252053]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

Use kstrtoint instead of simple_strtoul. It will work as the code already
set the delimiter byte to '\0' and we only do it when the field is not
empty.

Tested with offsets -1, 25, UINT_MAX and INT_MAX. Also tested with
examples documented at Documentation/admin-guide/binfmt-misc.rst and other
registrations from packages on Ubuntu.

Signed-off-by: Thadeu Lima de Souza Cascardo 
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: sta...@vger.kernel.org
---
 fs/binfmt_misc.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index a41b48f82a70..4de191563261 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -387,8 +387,13 @@ static Node *create_entry(const char __user *buffer, 
size_t count)
s = strchr(p, del);
if (!s)
goto einval;
-   *s++ = '\0';
-   e->offset = simple_strtoul(p, , 10);
+   *s = '\0';
+   if (p != s) {
+   int r = kstrtoint(p, 10, >offset);
+   if (r != 0 || e->offset < 0)
+   goto einval;
+   }
+   p = s;
if (*p++)
goto einval;
pr_debug("register: offset: %#x\n", e->offset);
@@ -428,7 +433,8 @@ static Node *create_entry(const char __user *buffer, size_t 
count)
if (e->mask &&
string_unescape_inplace(e->mask, UNESCAPE_HEX) != e->size)
goto einval;
-   if (e->size + e->offset > BINPRM_BUF_SIZE)
+   if (e->size > BINPRM_BUF_SIZE ||
+   BINPRM_BUF_SIZE - e->size < e->offset)
goto einval;
pr_debug("register: magic/mask length: %i\n", e->size);
if (USE_DEBUG) {
-- 
2.17.0

Re: Problem with global pages changeset and kvm

2018-05-08 Thread Thadeu Lima de Souza Cascardo

On Tue, May 08, 2018 at 07:15:06AM -0700, Dave Hansen wrote:
> Thanks for the excellent bug report!
> 
> On 05/08/2018 02:37 AM, Thadeu Lima de Souza Cascardo wrote:
> > 2) The bad address is next to do_syscall_64 on the host.
> 
> So a host address leaked into a guest oops?
> 
> We should bring the KVM folks into this and probably also need to widen
> the cc list quite a bit.
> 
> Can you boot the guest at all?

No, there are multiple oopses, the last one includes init, which makes the
guest panic. I can boot it if I add nopti to the guest command line.

Cascardo.

Re: Problem with global pages changeset and kvm

2018-05-08 Thread Thadeu Lima de Souza Cascardo

On Tue, May 08, 2018 at 07:15:06AM -0700, Dave Hansen wrote:
> Thanks for the excellent bug report!
> 
> On 05/08/2018 02:37 AM, Thadeu Lima de Souza Cascardo wrote:
> > 2) The bad address is next to do_syscall_64 on the host.
> 
> So a host address leaked into a guest oops?
> 
> We should bring the KVM folks into this and probably also need to widen
> the cc list quite a bit.
> 
> Can you boot the guest at all?

No, there are multiple oopses, the last one includes init, which makes the
guest panic. I can boot it if I add nopti to the guest command line.

Cascardo.

Problem with global pages changeset and kvm

2018-05-08 Thread Thadeu Lima de Souza Cascardo

When running a 4.15 kernel on top of 4.17-rc3, I noticed a problem on the guest:

[4.836637] BUG: unable to handle kernel NULL pointer dereference at 

[4.839290] IP: 0x8a00147e
[4.840300] PGD 0 P4D 0
[4.840510] Oops:  [#1] SMP PTI
[4.840510] Modules linked in: psmouse e1000 i2c_piix4 pata_acpi floppy
[4.840510] CPU: 0 PID: 177 Comm: exe Not tainted 4.15.0-20-generic 
#21-Ubuntu
[4.840510] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1ubuntu1 04/01/2014
[4.840510] RIP: 0010:0x8a00147e
[4.840510] RSP: 0018:9ea680413ee0 EFLAGS: 00010246
[4.840510] RAX:  RBX: 9ea680413f58 RCX: 
[4.840510] RDX:  RSI: 9ea680413f58 RDI: 00e7
[4.840510] RBP: 9ea680413f48 R08:  R09: 
[4.840510] R10:  R11:  R12: 00e7
[4.840510] R13:  R14:  R15: 
[4.840510] FS:  7f42a6ea7580() GS:91513c80() 
knlGS:
[4.840510] CS:  0010 DS:  ES:  CR0: 80050033
[4.840510] CR2: 8a00147e CR3: 3f84e000 CR4: 06f0
[4.840510] Call Trace:
[4.840510]  ? SyS_nanosleep+0x72/0xa0
[4.840510] Code:  Bad RIP value.
[4.840510] RIP: 0x8a00147e RSP: 9ea680413ee0
[4.840510] CR2: 
[4.898894] ---[ end trace f77f825085f5973c ]---


After a bisection and a little investigation, I realized:

1) The first commit where it happens is
0f561fce4d6979a50415616896512f87a6d1d5c8 ("x86/pti: Enable global pages for
shared areas"). Though reverting it on top of 4.17-rc3 will cause other
problems.

2) The bad address is next to do_syscall_64 on the host.

3) I have a non-PCID host, likely:
model name  : Intel(R) Core(TM)2 CPU P8600  @ 2.40GHz
00:00.0 Host bridge: Intel Corporation Mobile 4 Series Chipset Memory 
Controller Hub (rev 07)

4) On the host, I also see:
[48162.554505] [ cut here ]
[48162.554512] Bad FPU state detected at __switch_to+0x1d7/0x3a0, 
reinitializing FPU registers.
[48162.554518] WARNING: CPU: 1 PID: 0 at arch/x86/mm/extable.c:104 
ex_handler_fprestore+0x60/0x70
[48162.554519] Modules linked in: ccm iptable_filter arc4 binfmt_misc 
ip6table_filter ip6_tables kvm_intel kvm irqbypass input_leds ath5k mac80211 
ath cfg80211 thinkpad_acpi hwmon nvram battery ac acpi_cpufreq ip_tables 
x_tables dm_crypt psmouse ahci libahci i915 e1000e video intel_gtt i2c_algo_bit 
drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt 
fb_sys_fops cfbcopyarea drm drm_panel_orientation_quirks
[48162.554551] CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Not tainted 
4.17.0-rc2-3-ga44ca8f5a30c #17
[48162.554552] Hardware name: LENOVO 7458CJ3/7458CJ3, BIOS CBET4000 3774c98 
09/07/2016
[48162.554555] RIP: 0010:ex_handler_fprestore+0x60/0x70
[48162.554556] RSP: 0018:a5f88186b818 EFLAGS: 00010086
[48162.554558] RAX:  RBX: a5f88186b878 RCX: 8ae226b8
[48162.554559] RDX: 0001 RSI: 0086 RDI: 8af8a64c
[48162.554560] RBP: a5f88186b818 R08: 025e R09: 8af8caa0
[48162.554561] R10:  R11:  R12: 000d
[48162.554562] R13: 960266cf0b80 R14:  R15: 
[48162.554564] FS:  7f304bd72580() GS:96026fd0() 
knlGS:
[48162.554565] CS:  0010 DS:  ES:  CR0: 80050033
[48162.554567] CR2: 7f3ae3f5c00c CR3: 000168482000 CR4: 000426a0
[48162.554567] Call Trace:
[48162.554569] Code: 01 00 00 00 5d c3 48 0f ae 0d cd 49 e4 00 b8 01 00 00 00 
5d c3 48 89 c6 48 c7 c7 00 ba b9 8a c6 05 ba b8 e2 00 01 e8 20 bf 00 00 <0f> 0b 
eb b9 66 90 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 e8
[48162.554605] ---[ end trace 0107e9bc595237bb ]---

5) When disabling pti on the guest, the failure goes away. It also happens with
a 4.16, or 4.17-rc2 kernel, so not specific to the 4.15 Ubuntu kernel on the 
guest.

Let me know how I can help investigate this further, or test fixes for this.

Cascardo.

Problem with global pages changeset and kvm

2018-05-08 Thread Thadeu Lima de Souza Cascardo

When running a 4.15 kernel on top of 4.17-rc3, I noticed a problem on the guest:

[4.836637] BUG: unable to handle kernel NULL pointer dereference at 

[4.839290] IP: 0x8a00147e
[4.840300] PGD 0 P4D 0
[4.840510] Oops:  [#1] SMP PTI
[4.840510] Modules linked in: psmouse e1000 i2c_piix4 pata_acpi floppy
[4.840510] CPU: 0 PID: 177 Comm: exe Not tainted 4.15.0-20-generic 
#21-Ubuntu
[4.840510] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1ubuntu1 04/01/2014
[4.840510] RIP: 0010:0x8a00147e
[4.840510] RSP: 0018:9ea680413ee0 EFLAGS: 00010246
[4.840510] RAX:  RBX: 9ea680413f58 RCX: 
[4.840510] RDX:  RSI: 9ea680413f58 RDI: 00e7
[4.840510] RBP: 9ea680413f48 R08:  R09: 
[4.840510] R10:  R11:  R12: 00e7
[4.840510] R13:  R14:  R15: 
[4.840510] FS:  7f42a6ea7580() GS:91513c80() 
knlGS:
[4.840510] CS:  0010 DS:  ES:  CR0: 80050033
[4.840510] CR2: 8a00147e CR3: 3f84e000 CR4: 06f0
[4.840510] Call Trace:
[4.840510]  ? SyS_nanosleep+0x72/0xa0
[4.840510] Code:  Bad RIP value.
[4.840510] RIP: 0x8a00147e RSP: 9ea680413ee0
[4.840510] CR2: 
[4.898894] ---[ end trace f77f825085f5973c ]---


After a bisection and a little investigation, I realized:

1) The first commit where it happens is
0f561fce4d6979a50415616896512f87a6d1d5c8 ("x86/pti: Enable global pages for
shared areas"). Though reverting it on top of 4.17-rc3 will cause other
problems.

2) The bad address is next to do_syscall_64 on the host.

3) I have a non-PCID host, likely:
model name  : Intel(R) Core(TM)2 CPU P8600  @ 2.40GHz
00:00.0 Host bridge: Intel Corporation Mobile 4 Series Chipset Memory 
Controller Hub (rev 07)

4) On the host, I also see:
[48162.554505] [ cut here ]
[48162.554512] Bad FPU state detected at __switch_to+0x1d7/0x3a0, 
reinitializing FPU registers.
[48162.554518] WARNING: CPU: 1 PID: 0 at arch/x86/mm/extable.c:104 
ex_handler_fprestore+0x60/0x70
[48162.554519] Modules linked in: ccm iptable_filter arc4 binfmt_misc 
ip6table_filter ip6_tables kvm_intel kvm irqbypass input_leds ath5k mac80211 
ath cfg80211 thinkpad_acpi hwmon nvram battery ac acpi_cpufreq ip_tables 
x_tables dm_crypt psmouse ahci libahci i915 e1000e video intel_gtt i2c_algo_bit 
drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt 
fb_sys_fops cfbcopyarea drm drm_panel_orientation_quirks
[48162.554551] CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Not tainted 
4.17.0-rc2-3-ga44ca8f5a30c #17
[48162.554552] Hardware name: LENOVO 7458CJ3/7458CJ3, BIOS CBET4000 3774c98 
09/07/2016
[48162.554555] RIP: 0010:ex_handler_fprestore+0x60/0x70
[48162.554556] RSP: 0018:a5f88186b818 EFLAGS: 00010086
[48162.554558] RAX:  RBX: a5f88186b878 RCX: 8ae226b8
[48162.554559] RDX: 0001 RSI: 0086 RDI: 8af8a64c
[48162.554560] RBP: a5f88186b818 R08: 025e R09: 8af8caa0
[48162.554561] R10:  R11:  R12: 000d
[48162.554562] R13: 960266cf0b80 R14:  R15: 
[48162.554564] FS:  7f304bd72580() GS:96026fd0() 
knlGS:
[48162.554565] CS:  0010 DS:  ES:  CR0: 80050033
[48162.554567] CR2: 7f3ae3f5c00c CR3: 000168482000 CR4: 000426a0
[48162.554567] Call Trace:
[48162.554569] Code: 01 00 00 00 5d c3 48 0f ae 0d cd 49 e4 00 b8 01 00 00 00 
5d c3 48 89 c6 48 c7 c7 00 ba b9 8a c6 05 ba b8 e2 00 01 e8 20 bf 00 00 <0f> 0b 
eb b9 66 90 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 e8
[48162.554605] ---[ end trace 0107e9bc595237bb ]---

5) When disabling pti on the guest, the failure goes away. It also happens with
a 4.16, or 4.17-rc2 kernel, so not specific to the 4.15 Ubuntu kernel on the 
guest.

Let me know how I can help investigate this further, or test fixes for this.

Cascardo.

Re: [PATCH 4.15 00/72] 4.15.16-stable review

2018-04-06 Thread Thadeu Lima de Souza Cascardo

On Fri, Apr 06, 2018 at 03:23:35PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.15.16 release.
> There are 72 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Sun Apr  8 08:43:10 UTC 2018.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.15.16-rc1.gz
> or in the git tree and branch at:
>   
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.15.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h

Merged with bionic, built successfully on amd64, i386, s390x, armhf and
ppc64el. I have some problems with the arm64 cross toolchain, not related to
the kernel, so builds for arm64 have been failing. So I can report the status
for that right now.

Boot tested on ppc64el on a Unicamp minicloud VM.

Using it without any problems on my Thinkpad X200 with the amd64 kernel.

Cascardo.

Re: [PATCH 4.15 00/72] 4.15.16-stable review

2018-04-06 Thread Thadeu Lima de Souza Cascardo

On Fri, Apr 06, 2018 at 03:23:35PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.15.16 release.
> There are 72 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Sun Apr  8 08:43:10 UTC 2018.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.15.16-rc1.gz
> or in the git tree and branch at:
>   
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.15.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h

Merged with bionic, built successfully on amd64, i386, s390x, armhf and
ppc64el. I have some problems with the arm64 cross toolchain, not related to
the kernel, so builds for arm64 have been failing. So I can report the status
for that right now.

Boot tested on ppc64el on a Unicamp minicloud VM.

Using it without any problems on my Thinkpad X200 with the amd64 kernel.

Cascardo.

Re: [PATCH 4.15 000/105] 4.15.14-stable review

2018-03-28 Thread Thadeu Lima de Souza Cascardo

On Tue, Mar 27, 2018 at 06:26:40PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.15.14 release.
> There are 105 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Thu Mar 29 16:27:29 UTC 2018.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.15.14-rc1.gz
> or in the git tree and branch at:
>   
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.15.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h

Merged with bionic kernel, built successfully on amd64, arm64, armhf, i386,
ppc64el, s390x.

Survived a kernel build on ppc64el (KVM VM) and amd64 (Thinkpad X200).

Cascardo.

Re: [PATCH 4.15 000/105] 4.15.14-stable review

2018-03-28 Thread Thadeu Lima de Souza Cascardo

On Tue, Mar 27, 2018 at 06:26:40PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.15.14 release.
> There are 105 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Thu Mar 29 16:27:29 UTC 2018.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.15.14-rc1.gz
> or in the git tree and branch at:
>   
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.15.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h

Merged with bionic kernel, built successfully on amd64, arm64, armhf, i386,
ppc64el, s390x.

Survived a kernel build on ppc64el (KVM VM) and amd64 (Thinkpad X200).

Cascardo.

Re: [PATCH net-next v2 2/2] cxgb4: collect hardware dump in second kernel

2018-03-24 Thread Thadeu Lima de Souza Cascardo

On Sat, Mar 24, 2018 at 04:26:34PM +0530, Rahul Lakkireddy wrote:
> Register callback to collect hardware/firmware dumps in second kernel
> before hardware/firmware is initialized.  The dumps for each device
> will be available under /sys/kernel/crashdd/cxgb4/ directory in second
> kernel.
> 
> Signed-off-by: Rahul Lakkireddy 
> Signed-off-by: Ganesh Goudar 
> ---
> v2:
> - No Changes.
> 
> Changes since rfc v2:
> - Update comments and commit message for sysfs change.
> 
> rfc v2:
> - Updated dump registration to the new API in patch 1.
[...]
> diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
> b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> index e880be8e3c45..265cb026f868 100644
> --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> @@ -5527,6 +5527,18 @@ static int init_one(struct pci_dev *pdev, const struct 
> pci_device_id *ent)
>   if (err)
>   goto out_free_adapter;
>  
> + if (is_kdump_kernel()) {
> + /* Collect hardware state and append to
> +  * /sys/kernel/crashdd/cxgb4/ directory
> +  */
> + err = cxgb4_cudbg_crashdd_add_dump(adapter);
> + if (err) {
> + dev_warn(adapter->pdev_dev,
> +  "Fail collecting crash device dump, err: %d. 
> Continuing\n",
> +  err);
> + err = 0;
> + }
> + }
>  

The problem I see with this approach is that you require that the driver
is built into the kdump kernel (or present as a module in the kdump
initramfs), and that you will probe the device during the collection of
the dumps.

IMHO, if you are going to require the device to be probed by the same
driver during kdump, you might just as well use the device object itself
to present the crash data. I think that's what Stephen Hemminger meant
when he said to use sysfs. No need at all for any special crashdd. Just
add an attribute or attribute group to the device object.

Otherwise, as Eric Biederman pointed out, you should just add that data
into the vmcore before you kexec, so you don't even need to look at a
different file, and the driver does not even need to be present in the
kdump kernel.

Cascardo.

>   if (!is_t4(adapter->params.chip)) {
>   s_qpp = (QUEUESPERPAGEPF0_S +
> -- 
> 2.14.1

Re: [PATCH net-next v2 2/2] cxgb4: collect hardware dump in second kernel

2018-03-24 Thread Thadeu Lima de Souza Cascardo

On Sat, Mar 24, 2018 at 04:26:34PM +0530, Rahul Lakkireddy wrote:
> Register callback to collect hardware/firmware dumps in second kernel
> before hardware/firmware is initialized.  The dumps for each device
> will be available under /sys/kernel/crashdd/cxgb4/ directory in second
> kernel.
> 
> Signed-off-by: Rahul Lakkireddy 
> Signed-off-by: Ganesh Goudar 
> ---
> v2:
> - No Changes.
> 
> Changes since rfc v2:
> - Update comments and commit message for sysfs change.
> 
> rfc v2:
> - Updated dump registration to the new API in patch 1.
[...]
> diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
> b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> index e880be8e3c45..265cb026f868 100644
> --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> @@ -5527,6 +5527,18 @@ static int init_one(struct pci_dev *pdev, const struct 
> pci_device_id *ent)
>   if (err)
>   goto out_free_adapter;
>  
> + if (is_kdump_kernel()) {
> + /* Collect hardware state and append to
> +  * /sys/kernel/crashdd/cxgb4/ directory
> +  */
> + err = cxgb4_cudbg_crashdd_add_dump(adapter);
> + if (err) {
> + dev_warn(adapter->pdev_dev,
> +  "Fail collecting crash device dump, err: %d. 
> Continuing\n",
> +  err);
> + err = 0;
> + }
> + }
>  

The problem I see with this approach is that you require that the driver
is built into the kdump kernel (or present as a module in the kdump
initramfs), and that you will probe the device during the collection of
the dumps.

IMHO, if you are going to require the device to be probed by the same
driver during kdump, you might just as well use the device object itself
to present the crash data. I think that's what Stephen Hemminger meant
when he said to use sysfs. No need at all for any special crashdd. Just
add an attribute or attribute group to the device object.

Otherwise, as Eric Biederman pointed out, you should just add that data
into the vmcore before you kexec, so you don't even need to look at a
different file, and the driver does not even need to be present in the
kdump kernel.

Cascardo.

>   if (!is_t4(adapter->params.chip)) {
>   s_qpp = (QUEUESPERPAGEPF0_S +
> -- 
> 2.14.1

Re: [PATCH] test_bpf: Fix testing with CONFIG_BPF_JIT_ALWAYS_ON=y on other arches

2018-03-20 Thread Thadeu Lima de Souza Cascardo

On Tue, Mar 20, 2018 at 09:05:15AM -0700, Yonghong Song wrote:
> 
> 
> On 3/20/18 5:58 AM, Thadeu Lima de Souza Cascardo wrote:
> > Function bpf_fill_maxinsns11 is designed to not be able to be JITed on
> > x86_64. So, it fails when CONFIG_BPF_JIT_ALWAYS_ON=y, and
> > commit 09584b406742 ("bpf: fix selftests/bpf test_kmod.sh failure when
> > CONFIG_BPF_JIT_ALWAYS_ON=y") makes sure that failure is detected on that
> > case.
> > 
> > However, it does not fail on other architectures, which have a different
> > JIT compiler design. So, test_bpf has started to fail to load on those.
> 
> Here, you mentioned that it did not fail on other architectures. Have you
> verified all of them or just looked through the algorithm.

>From our testing, I know at least I get an UNEXPECTED_PASS on arm64, arm, s390x
and ppc64le. i386 doesn't have JIT, so it doesn't have
CONFIG_BPF_JIT_ALWAYS_ON=y.

> 
> Could you give a little bit details about other architectures are okay while
> x86 is not? Maybe, x86 JIT can be improved some how?

As the comment on that functions says:

/* Hits 70 passes on x86_64, so cannot get JITed there. */

And looking at x86_64 JIT compiler, you will notice it's looping trying to
minimize the size of the code, limited to 10 passes. If it does not converge,
it goes back to the non-JIT code.

That's not the case on powerpc or arm, that do not do multiple passes. sparc
seem to do 3 passes, but does not seem to go back to non-JIT code.

Cascardo.

> 
> Thanks!
> 
> > 
> > After this fix, test_bpf loads fine on both x86_64 and ppc64el.
> > 
> > Fixes: 09584b406742 ("bpf: fix selftests/bpf test_kmod.sh failure when 
> > CONFIG_BPF_JIT_ALWAYS_ON=y")
> > Signed-off-by: Thadeu Lima de Souza Cascardo <casca...@canonical.com>
> > ---
> >   lib/test_bpf.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/lib/test_bpf.c b/lib/test_bpf.c
> > index 2efb213716faa..3e9335493fe49 100644
> > --- a/lib/test_bpf.c
> > +++ b/lib/test_bpf.c
> > @@ -5467,7 +5467,7 @@ static struct bpf_test tests[] = {
> > {
> > "BPF_MAXINSNS: Jump, gap, jump, ...",
> > { },
> > -#ifdef CONFIG_BPF_JIT_ALWAYS_ON
> > +#if defined(CONFIG_BPF_JIT_ALWAYS_ON) && defined(CONFIG_X86)
> > CLASSIC | FLAG_NO_DATA | FLAG_EXPECTED_FAIL,
> >   #else
> > CLASSIC | FLAG_NO_DATA,
> >

Re: [PATCH] test_bpf: Fix testing with CONFIG_BPF_JIT_ALWAYS_ON=y on other arches

2018-03-20 Thread Thadeu Lima de Souza Cascardo

On Tue, Mar 20, 2018 at 09:05:15AM -0700, Yonghong Song wrote:
> 
> 
> On 3/20/18 5:58 AM, Thadeu Lima de Souza Cascardo wrote:
> > Function bpf_fill_maxinsns11 is designed to not be able to be JITed on
> > x86_64. So, it fails when CONFIG_BPF_JIT_ALWAYS_ON=y, and
> > commit 09584b406742 ("bpf: fix selftests/bpf test_kmod.sh failure when
> > CONFIG_BPF_JIT_ALWAYS_ON=y") makes sure that failure is detected on that
> > case.
> > 
> > However, it does not fail on other architectures, which have a different
> > JIT compiler design. So, test_bpf has started to fail to load on those.
> 
> Here, you mentioned that it did not fail on other architectures. Have you
> verified all of them or just looked through the algorithm.

>From our testing, I know at least I get an UNEXPECTED_PASS on arm64, arm, s390x
and ppc64le. i386 doesn't have JIT, so it doesn't have
CONFIG_BPF_JIT_ALWAYS_ON=y.

> 
> Could you give a little bit details about other architectures are okay while
> x86 is not? Maybe, x86 JIT can be improved some how?

As the comment on that functions says:

/* Hits 70 passes on x86_64, so cannot get JITed there. */

And looking at x86_64 JIT compiler, you will notice it's looping trying to
minimize the size of the code, limited to 10 passes. If it does not converge,
it goes back to the non-JIT code.

That's not the case on powerpc or arm, that do not do multiple passes. sparc
seem to do 3 passes, but does not seem to go back to non-JIT code.

Cascardo.

> 
> Thanks!
> 
> > 
> > After this fix, test_bpf loads fine on both x86_64 and ppc64el.
> > 
> > Fixes: 09584b406742 ("bpf: fix selftests/bpf test_kmod.sh failure when 
> > CONFIG_BPF_JIT_ALWAYS_ON=y")
> > Signed-off-by: Thadeu Lima de Souza Cascardo 
> > ---
> >   lib/test_bpf.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/lib/test_bpf.c b/lib/test_bpf.c
> > index 2efb213716faa..3e9335493fe49 100644
> > --- a/lib/test_bpf.c
> > +++ b/lib/test_bpf.c
> > @@ -5467,7 +5467,7 @@ static struct bpf_test tests[] = {
> > {
> > "BPF_MAXINSNS: Jump, gap, jump, ...",
> > { },
> > -#ifdef CONFIG_BPF_JIT_ALWAYS_ON
> > +#if defined(CONFIG_BPF_JIT_ALWAYS_ON) && defined(CONFIG_X86)
> > CLASSIC | FLAG_NO_DATA | FLAG_EXPECTED_FAIL,
> >   #else
> > CLASSIC | FLAG_NO_DATA,
> >

[PATCH] test_bpf: Fix testing with CONFIG_BPF_JIT_ALWAYS_ON=y on other arches

2018-03-20 Thread Thadeu Lima de Souza Cascardo

Function bpf_fill_maxinsns11 is designed to not be able to be JITed on
x86_64. So, it fails when CONFIG_BPF_JIT_ALWAYS_ON=y, and
commit 09584b406742 ("bpf: fix selftests/bpf test_kmod.sh failure when
CONFIG_BPF_JIT_ALWAYS_ON=y") makes sure that failure is detected on that
case.

However, it does not fail on other architectures, which have a different
JIT compiler design. So, test_bpf has started to fail to load on those.

After this fix, test_bpf loads fine on both x86_64 and ppc64el.

Fixes: 09584b406742 ("bpf: fix selftests/bpf test_kmod.sh failure when 
CONFIG_BPF_JIT_ALWAYS_ON=y")
Signed-off-by: Thadeu Lima de Souza Cascardo <casca...@canonical.com>
---
 lib/test_bpf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/test_bpf.c b/lib/test_bpf.c
index 2efb213716faa..3e9335493fe49 100644
--- a/lib/test_bpf.c
+++ b/lib/test_bpf.c
@@ -5467,7 +5467,7 @@ static struct bpf_test tests[] = {
{
"BPF_MAXINSNS: Jump, gap, jump, ...",
{ },
-#ifdef CONFIG_BPF_JIT_ALWAYS_ON
+#if defined(CONFIG_BPF_JIT_ALWAYS_ON) && defined(CONFIG_X86)
CLASSIC | FLAG_NO_DATA | FLAG_EXPECTED_FAIL,
 #else
CLASSIC | FLAG_NO_DATA,
-- 
2.15.1

[PATCH] test_bpf: Fix testing with CONFIG_BPF_JIT_ALWAYS_ON=y on other arches

2018-03-20 Thread Thadeu Lima de Souza Cascardo

Function bpf_fill_maxinsns11 is designed to not be able to be JITed on
x86_64. So, it fails when CONFIG_BPF_JIT_ALWAYS_ON=y, and
commit 09584b406742 ("bpf: fix selftests/bpf test_kmod.sh failure when
CONFIG_BPF_JIT_ALWAYS_ON=y") makes sure that failure is detected on that
case.

However, it does not fail on other architectures, which have a different
JIT compiler design. So, test_bpf has started to fail to load on those.

After this fix, test_bpf loads fine on both x86_64 and ppc64el.

Fixes: 09584b406742 ("bpf: fix selftests/bpf test_kmod.sh failure when 
CONFIG_BPF_JIT_ALWAYS_ON=y")
Signed-off-by: Thadeu Lima de Souza Cascardo 
---
 lib/test_bpf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/test_bpf.c b/lib/test_bpf.c
index 2efb213716faa..3e9335493fe49 100644
--- a/lib/test_bpf.c
+++ b/lib/test_bpf.c
@@ -5467,7 +5467,7 @@ static struct bpf_test tests[] = {
{
"BPF_MAXINSNS: Jump, gap, jump, ...",
{ },
-#ifdef CONFIG_BPF_JIT_ALWAYS_ON
+#if defined(CONFIG_BPF_JIT_ALWAYS_ON) && defined(CONFIG_X86)
CLASSIC | FLAG_NO_DATA | FLAG_EXPECTED_FAIL,
 #else
CLASSIC | FLAG_NO_DATA,
-- 
2.15.1

Re: [PATCH 4.15 00/52] 4.15.12-stable review

2018-03-20 Thread Thadeu Lima de Souza Cascardo

On Mon, Mar 19, 2018 at 07:07:58PM +0100, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.15.12 release.
> There are 52 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Wed Mar 21 18:07:15 UTC 2018.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.15.12-rc1.gz
> or in the git tree and branch at:
>   
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.15.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h

Merged with Ubuntu bionic changes, build successfully on amd64, i386, ppc64el,
arm64, armhf and s390x.

Boot tested on ppc64el virtual machine and amd64 Thinkpad X200.

Cascardo.

Re: [PATCH 4.15 00/52] 4.15.12-stable review

2018-03-20 Thread Thadeu Lima de Souza Cascardo

On Mon, Mar 19, 2018 at 07:07:58PM +0100, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.15.12 release.
> There are 52 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Wed Mar 21 18:07:15 UTC 2018.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.15.12-rc1.gz
> or in the git tree and branch at:
>   
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.15.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h

Merged with Ubuntu bionic changes, build successfully on amd64, i386, ppc64el,
arm64, armhf and s390x.

Boot tested on ppc64el virtual machine and amd64 Thinkpad X200.

Cascardo.

Re: [PATCH 4.15 000/122] 4.15.8-stable review

2018-03-08 Thread Thadeu Lima de Souza Cascardo

On Thu, Mar 08, 2018 at 09:27:23AM -0800, Greg Kroah-Hartman wrote:
> On Thu, Mar 08, 2018 at 07:35:41AM -0300, Thadeu Lima de Souza Cascardo wrote:
> > On Wed, Mar 07, 2018 at 11:36:52AM -0800, Greg Kroah-Hartman wrote:
> > > This is the start of the stable review cycle for the 4.15.8 release.
> > > There are 122 patches in this series, all will be posted as a response
> > > to this one.  If anyone has any issues with these being applied, please
> > > let me know.
> > > 
> > > Responses should be made by Fri Mar  9 19:16:43 UTC 2018.
> > > Anything received after that time might be too late.
> > > 
> > > The whole patch series can be found in one patch at:
> > >   
> > > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.15.8-rc1.gz
> > > or in the git tree and branch at:
> > >   
> > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> > > linux-4.15.y
> > > and the diffstat can be found below.
> > > 
> > > thanks,
> > > 
> > > greg k-h
> > > 
> > > -
> > 
> > Merged, built on amd64, arm64, armhf, i386, ppc64el, s390x.
> 
> What do you mean by "merged"?

That I merged it with Ubuntu changes as in
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic.

> 
> And what are the results of the builds, do you boot them as well?  If

All builds succeeded. I was planning on reporting it if they failed, but
thought it would be useful to report that they succeeded just as well.

I have only booted the amd64 build on my own computer so far.

> you boot, are you running any tests on them?  Is this QEMU only or "real
> hardware"?
> 

We run tests on some (mostly virtual) machines for those different
architectures before they (the Ubuntu kernels) are released, but not on this
phase. In fact, we only merge them or do any build testing after they (your
stable releases) are tagged and pushed.

By my own account, I decided I would contribute somehow doing those build
tests, but might push for those tests to be run earlier, so we can report back
its results, if you find it would be useful.

Regards.
Cascardo.

> thanks,
> 
> greg k-h

Re: [PATCH 4.15 000/122] 4.15.8-stable review

2018-03-08 Thread Thadeu Lima de Souza Cascardo

On Thu, Mar 08, 2018 at 09:27:23AM -0800, Greg Kroah-Hartman wrote:
> On Thu, Mar 08, 2018 at 07:35:41AM -0300, Thadeu Lima de Souza Cascardo wrote:
> > On Wed, Mar 07, 2018 at 11:36:52AM -0800, Greg Kroah-Hartman wrote:
> > > This is the start of the stable review cycle for the 4.15.8 release.
> > > There are 122 patches in this series, all will be posted as a response
> > > to this one.  If anyone has any issues with these being applied, please
> > > let me know.
> > > 
> > > Responses should be made by Fri Mar  9 19:16:43 UTC 2018.
> > > Anything received after that time might be too late.
> > > 
> > > The whole patch series can be found in one patch at:
> > >   
> > > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.15.8-rc1.gz
> > > or in the git tree and branch at:
> > >   
> > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> > > linux-4.15.y
> > > and the diffstat can be found below.
> > > 
> > > thanks,
> > > 
> > > greg k-h
> > > 
> > > -
> > 
> > Merged, built on amd64, arm64, armhf, i386, ppc64el, s390x.
> 
> What do you mean by "merged"?

That I merged it with Ubuntu changes as in
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic.

> 
> And what are the results of the builds, do you boot them as well?  If

All builds succeeded. I was planning on reporting it if they failed, but
thought it would be useful to report that they succeeded just as well.

I have only booted the amd64 build on my own computer so far.

> you boot, are you running any tests on them?  Is this QEMU only or "real
> hardware"?
> 

We run tests on some (mostly virtual) machines for those different
architectures before they (the Ubuntu kernels) are released, but not on this
phase. In fact, we only merge them or do any build testing after they (your
stable releases) are tagged and pushed.

By my own account, I decided I would contribute somehow doing those build
tests, but might push for those tests to be run earlier, so we can report back
its results, if you find it would be useful.

Regards.
Cascardo.

> thanks,
> 
> greg k-h

Re: [PATCH 4.15 000/122] 4.15.8-stable review

2018-03-08 Thread Thadeu Lima de Souza Cascardo

On Wed, Mar 07, 2018 at 11:36:52AM -0800, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.15.8 release.
> There are 122 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Fri Mar  9 19:16:43 UTC 2018.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.15.8-rc1.gz
> or in the git tree and branch at:
>   
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.15.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
> 
> -

Merged, built on amd64, arm64, armhf, i386, ppc64el, s390x.

Cascardo.

Re: [PATCH 4.15 000/122] 4.15.8-stable review

2018-03-08 Thread Thadeu Lima de Souza Cascardo

On Wed, Mar 07, 2018 at 11:36:52AM -0800, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.15.8 release.
> There are 122 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Fri Mar  9 19:16:43 UTC 2018.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.15.8-rc1.gz
> or in the git tree and branch at:
>   
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.15.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
> 
> -

Merged, built on amd64, arm64, armhf, i386, ppc64el, s390x.

Cascardo.

Re: [PATCH 5/6] platform/x86: make device_attribute const

2017-08-21 Thread Thadeu Lima de Souza Cascardo

For classmate-laptop.c

Acked-by: Thadeu Lima de Souza Cascardo <casca...@holoscopio.com>

Re: [PATCH 5/6] platform/x86: make device_attribute const

2017-08-21 Thread Thadeu Lima de Souza Cascardo

For classmate-laptop.c

Acked-by: Thadeu Lima de Souza Cascardo

[PATCH] tty: fix comment typo s/repsonsible/responsible/

2017-04-04 Thread Thadeu Lima de Souza Cascardo

Signed-off-by: Thadeu Lima de Souza Cascardo <casca...@cascardo.eti.br>
---
 drivers/tty/tty_io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index e6d1a65..309d250 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -3370,7 +3370,7 @@ EXPORT_SYMBOL(tty_unregister_device);
 /**
  * __tty_alloc_driver -- allocate tty driver
  * @lines: count of lines this driver can handle at most
- * @owner: module which is repsonsible for this driver
+ * @owner: module which is responsible for this driver
  * @flags: some of TTY_DRIVER_* flags, will be set in driver->flags
  *
  * This should not be called directly, some of the provided macros should be
-- 
2.10.2

[PATCH] tty: fix comment typo s/repsonsible/responsible/

2017-04-04 Thread Thadeu Lima de Souza Cascardo

Signed-off-by: Thadeu Lima de Souza Cascardo 
---
 drivers/tty/tty_io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index e6d1a65..309d250 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -3370,7 +3370,7 @@ EXPORT_SYMBOL(tty_unregister_device);
 /**
  * __tty_alloc_driver -- allocate tty driver
  * @lines: count of lines this driver can handle at most
- * @owner: module which is repsonsible for this driver
+ * @owner: module which is responsible for this driver
  * @flags: some of TTY_DRIVER_* flags, will be set in driver->flags
  *
  * This should not be called directly, some of the provided macros should be
-- 
2.10.2

[PATCH v2] powerpc: make /proc/self/stack always print the current stack

2017-03-27 Thread Thadeu Lima de Souza Cascardo

For the current task, the kernel stack would only tell the last time the
process was rescheduled, if ever. Use the current stack pointer for the
current task.

Otherwise, every once in a while, the stacktrace printed when reading
/proc/self/stack would look like the process is running in userspace,
while it's not, which some may consider as a bug.

This is also consistent with some other architectures, like x86 and arm,
at least.

Signed-off-by: Thadeu Lima de Souza Cascardo <casca...@canonical.com>
---
 arch/powerpc/kernel/stacktrace.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/stacktrace.c b/arch/powerpc/kernel/stacktrace.c
index 6671195..d534ed9 100644
--- a/arch/powerpc/kernel/stacktrace.c
+++ b/arch/powerpc/kernel/stacktrace.c
@@ -59,7 +59,14 @@ EXPORT_SYMBOL_GPL(save_stack_trace);
 
 void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)
 {
-   save_context_stack(trace, tsk->thread.ksp, tsk, 0);
+   unsigned long sp;
+
+   if (tsk == current)
+   sp = current_stack_pointer();
+   else
+   sp = tsk->thread.ksp;
+
+   save_context_stack(trace, sp, tsk, 0);
 }
 EXPORT_SYMBOL_GPL(save_stack_trace_tsk);
 
-- 
2.9.3

[PATCH v2] powerpc: make /proc/self/stack always print the current stack

2017-03-27 Thread Thadeu Lima de Souza Cascardo

For the current task, the kernel stack would only tell the last time the
process was rescheduled, if ever. Use the current stack pointer for the
current task.

Otherwise, every once in a while, the stacktrace printed when reading
/proc/self/stack would look like the process is running in userspace,
while it's not, which some may consider as a bug.

This is also consistent with some other architectures, like x86 and arm,
at least.

Signed-off-by: Thadeu Lima de Souza Cascardo 
---
 arch/powerpc/kernel/stacktrace.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/stacktrace.c b/arch/powerpc/kernel/stacktrace.c
index 6671195..d534ed9 100644
--- a/arch/powerpc/kernel/stacktrace.c
+++ b/arch/powerpc/kernel/stacktrace.c
@@ -59,7 +59,14 @@ EXPORT_SYMBOL_GPL(save_stack_trace);
 
 void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)
 {
-   save_context_stack(trace, tsk->thread.ksp, tsk, 0);
+   unsigned long sp;
+
+   if (tsk == current)
+   sp = current_stack_pointer();
+   else
+   sp = tsk->thread.ksp;
+
+   save_context_stack(trace, sp, tsk, 0);
 }
 EXPORT_SYMBOL_GPL(save_stack_trace_tsk);
 
-- 
2.9.3

[PATCH] powerpc: fix /proc/self/stack

2017-03-17 Thread Thadeu Lima de Souza Cascardo

For the current task, the kernel stack would only tell the last time the
process was rescheduled, if ever. Use the current stack pointer for the
current task.

This is also consistent with some other architectures.

Signed-off-by: Thadeu Lima de Souza Cascardo <casca...@canonical.com>
---
 arch/powerpc/kernel/stacktrace.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/stacktrace.c b/arch/powerpc/kernel/stacktrace.c
index 6671195..2446066 100644
--- a/arch/powerpc/kernel/stacktrace.c
+++ b/arch/powerpc/kernel/stacktrace.c
@@ -59,7 +59,12 @@ EXPORT_SYMBOL_GPL(save_stack_trace);
 
 void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)
 {
-   save_context_stack(trace, tsk->thread.ksp, tsk, 0);
+   unsigned long sp = tsk->thread.ksp;
+
+   if (tsk == current)
+   sp = current_stack_pointer();
+
+   save_context_stack(trace, sp, tsk, 0);
 }
 EXPORT_SYMBOL_GPL(save_stack_trace_tsk);
 
-- 
2.9.3

[PATCH] powerpc: fix /proc/self/stack

2017-03-17 Thread Thadeu Lima de Souza Cascardo

For the current task, the kernel stack would only tell the last time the
process was rescheduled, if ever. Use the current stack pointer for the
current task.

This is also consistent with some other architectures.

Signed-off-by: Thadeu Lima de Souza Cascardo 
---
 arch/powerpc/kernel/stacktrace.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/stacktrace.c b/arch/powerpc/kernel/stacktrace.c
index 6671195..2446066 100644
--- a/arch/powerpc/kernel/stacktrace.c
+++ b/arch/powerpc/kernel/stacktrace.c
@@ -59,7 +59,12 @@ EXPORT_SYMBOL_GPL(save_stack_trace);
 
 void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)
 {
-   save_context_stack(trace, tsk->thread.ksp, tsk, 0);
+   unsigned long sp = tsk->thread.ksp;
+
+   if (tsk == current)
+   sp = current_stack_pointer();
+
+   save_context_stack(trace, sp, tsk, 0);
 }
 EXPORT_SYMBOL_GPL(save_stack_trace_tsk);
 
-- 
2.9.3

[PATCH] serial: core: remove silly test for uart_state

2017-01-05 Thread Thadeu Lima de Souza Cascardo

The polling functions were checking for a NULL uart_state, which is
indexed from uart_driver->state. It should be always allocated and
non-NULL when the tty_driver is registered, and line should not be
larger than the tty_driver->num anyways.

Signed-off-by: Thadeu Lima de Souza Cascardo <casca...@cascardo.eti.br>
---
 drivers/tty/serial/serial_core.c | 16 
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
index 9939c3d9912b..6f7247114ef8 100644
--- a/drivers/tty/serial/serial_core.c
+++ b/drivers/tty/serial/serial_core.c
@@ -2330,9 +2330,6 @@ static int uart_poll_init(struct tty_driver *driver, int 
line, char *options)
int flow = 'n';
int ret = 0;
 
-   if (!state)
-   return -1;
-
tport = >port;
mutex_lock(>mutex);
 
@@ -2367,12 +2364,10 @@ static int uart_poll_get_char(struct tty_driver 
*driver, int line)
struct uart_port *port;
int ret = -1;
 
-   if (state) {
-   port = uart_port_ref(state);
-   if (port) {
-   ret = port->ops->poll_get_char(port);
-   uart_port_deref(port);
-   }
+   port = uart_port_ref(state);
+   if (port) {
+   ret = port->ops->poll_get_char(port);
+   uart_port_deref(port);
}
return ret;
 }
@@ -2383,9 +2378,6 @@ static void uart_poll_put_char(struct tty_driver *driver, 
int line, char ch)
struct uart_state *state = drv->state + line;
struct uart_port *port;
 
-   if (!state)
-   return;
-
port = uart_port_ref(state);
if (!port)
return;
-- 
2.11.0

[PATCH] serial: core: remove silly test for uart_state

2017-01-05 Thread Thadeu Lima de Souza Cascardo

The polling functions were checking for a NULL uart_state, which is
indexed from uart_driver->state. It should be always allocated and
non-NULL when the tty_driver is registered, and line should not be
larger than the tty_driver->num anyways.

Signed-off-by: Thadeu Lima de Souza Cascardo 
---
 drivers/tty/serial/serial_core.c | 16 
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
index 9939c3d9912b..6f7247114ef8 100644
--- a/drivers/tty/serial/serial_core.c
+++ b/drivers/tty/serial/serial_core.c
@@ -2330,9 +2330,6 @@ static int uart_poll_init(struct tty_driver *driver, int 
line, char *options)
int flow = 'n';
int ret = 0;
 
-   if (!state)
-   return -1;
-
tport = >port;
mutex_lock(>mutex);
 
@@ -2367,12 +2364,10 @@ static int uart_poll_get_char(struct tty_driver 
*driver, int line)
struct uart_port *port;
int ret = -1;
 
-   if (state) {
-   port = uart_port_ref(state);
-   if (port) {
-   ret = port->ops->poll_get_char(port);
-   uart_port_deref(port);
-   }
+   port = uart_port_ref(state);
+   if (port) {
+   ret = port->ops->poll_get_char(port);
+   uart_port_deref(port);
}
return ret;
 }
@@ -2383,9 +2378,6 @@ static void uart_poll_put_char(struct tty_driver *driver, 
int line, char ch)
struct uart_state *state = drv->state + line;
struct uart_port *port;
 
-   if (!state)
-   return;
-
port = uart_port_ref(state);
if (!port)
return;
-- 
2.11.0

Re: [PATCH] serial/8250: remove comment about schedule_timeout

2017-01-05 Thread Thadeu Lima de Souza Cascardo

On Thu, Jan 05, 2017 at 04:40:16PM +, One Thousand Gnomes wrote:
> On Thu,  5 Jan 2017 13:48:40 -0200
> Thadeu Lima de Souza Cascardo <casca...@canonical.com> wrote:
> 
> > Ted T'so has added the function size_fifo in 1999 for the 2.3 series
> > [1], a long time ago.
> > 
> > During the 2.5 cycle, Russell King has restructured the serial drivers
> > and, in that process, has suggested using schedule_timeout instead of
> > mdelay in size_fifo. [2]
> > 
> > It was only at 2.6.7 that Greg Kroah-Hartman added the msleep function
> > to the core kernel, as people were starting to duplicate it. [3]
> > 
> > However, as size_fifo is called under a spinlock from the autoconfig
> > function, we might not use msleep here, so removing that comment is the
> > appropriate thing to do.
> 
> No.. it's still a flaw in the driver that we can't use msleep here. It's
> one of those things that people want to know if the probe locking ever
> gets restructured.
> 
> Alan

Okay. At first, I wanted to simply use msleep there, then realized there
was the spinlock. I thought I got a good writeup of the history that was
not in git yet, maybe we should just add a better comment and more
details in the commit log. How about this as the comment?

Cascardo.

---
diff --git a/drivers/tty/serial/8250/8250_port.c 
b/drivers/tty/serial/8250/8250_port.c
index 1731b98d2471..51efbfcfb922 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -796,7 +796,8 @@ static int size_fifo(struct uart_8250_port *up)
serial_out(up, UART_LCR, 0x03);
for (count = 0; count < 256; count++)
serial_out(up, UART_TX, count);
-   mdelay(20);/* FIXME - schedule_timeout */
+   /* FIXME - use msleep when probe locking is restructured */
+   mdelay(20);
for (count = 0; (serial_in(up, UART_LSR) & UART_LSR_DR) &&
 (count < 256); count++)
serial_in(up, UART_RX);

Re: [PATCH] serial/8250: remove comment about schedule_timeout

2017-01-05 Thread Thadeu Lima de Souza Cascardo

On Thu, Jan 05, 2017 at 04:40:16PM +, One Thousand Gnomes wrote:
> On Thu,  5 Jan 2017 13:48:40 -0200
> Thadeu Lima de Souza Cascardo  wrote:
> 
> > Ted T'so has added the function size_fifo in 1999 for the 2.3 series
> > [1], a long time ago.
> > 
> > During the 2.5 cycle, Russell King has restructured the serial drivers
> > and, in that process, has suggested using schedule_timeout instead of
> > mdelay in size_fifo. [2]
> > 
> > It was only at 2.6.7 that Greg Kroah-Hartman added the msleep function
> > to the core kernel, as people were starting to duplicate it. [3]
> > 
> > However, as size_fifo is called under a spinlock from the autoconfig
> > function, we might not use msleep here, so removing that comment is the
> > appropriate thing to do.
> 
> No.. it's still a flaw in the driver that we can't use msleep here. It's
> one of those things that people want to know if the probe locking ever
> gets restructured.
> 
> Alan

Okay. At first, I wanted to simply use msleep there, then realized there
was the spinlock. I thought I got a good writeup of the history that was
not in git yet, maybe we should just add a better comment and more
details in the commit log. How about this as the comment?

Cascardo.

---
diff --git a/drivers/tty/serial/8250/8250_port.c 
b/drivers/tty/serial/8250/8250_port.c
index 1731b98d2471..51efbfcfb922 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -796,7 +796,8 @@ static int size_fifo(struct uart_8250_port *up)
serial_out(up, UART_LCR, 0x03);
for (count = 0; count < 256; count++)
serial_out(up, UART_TX, count);
-   mdelay(20);/* FIXME - schedule_timeout */
+   /* FIXME - use msleep when probe locking is restructured */
+   mdelay(20);
for (count = 0; (serial_in(up, UART_LSR) & UART_LSR_DR) &&
 (count < 256); count++)
serial_in(up, UART_RX);

[PATCH] serial/8250: remove comment about schedule_timeout

2017-01-05 Thread Thadeu Lima de Souza Cascardo

Ted T'so has added the function size_fifo in 1999 for the 2.3 series
[1], a long time ago.

During the 2.5 cycle, Russell King has restructured the serial drivers
and, in that process, has suggested using schedule_timeout instead of
mdelay in size_fifo. [2]

It was only at 2.6.7 that Greg Kroah-Hartman added the msleep function
to the core kernel, as people were starting to duplicate it. [3]

However, as size_fifo is called under a spinlock from the autoconfig
function, we might not use msleep here, so removing that comment is the
appropriate thing to do.

[1] http://lkml.iu.edu/hypermail/linux/kernel/9908.3/1229.html
[2] 
https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/?id=33c0d1b0c3ebb61243d9b19ce70d9063acff2aac
[3] 
https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/?id=8365c315507fe10925bb3281d7fe02935b25

Signed-off-by: Thadeu Lima de Souza Cascardo <casca...@canonical.com>
Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Cc: Jiri Slaby <jsl...@suse.com>
Cc: linux-ser...@vger.kernel.org
Cc: Russell King <rmk+ker...@armlinux.org.uk>
Cc: Theodore Ts'o <ty...@mit.edu>
---
 drivers/tty/serial/8250/8250_port.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tty/serial/8250/8250_port.c 
b/drivers/tty/serial/8250/8250_port.c
index 1731b98d2471..645daf551fdf 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -796,7 +796,7 @@ static int size_fifo(struct uart_8250_port *up)
serial_out(up, UART_LCR, 0x03);
for (count = 0; count < 256; count++)
serial_out(up, UART_TX, count);
-   mdelay(20);/* FIXME - schedule_timeout */
+   mdelay(20);
for (count = 0; (serial_in(up, UART_LSR) & UART_LSR_DR) &&
 (count < 256); count++)
serial_in(up, UART_RX);
-- 
2.11.0

[PATCH] serial/8250: remove comment about schedule_timeout

2017-01-05 Thread Thadeu Lima de Souza Cascardo

Ted T'so has added the function size_fifo in 1999 for the 2.3 series
[1], a long time ago.

During the 2.5 cycle, Russell King has restructured the serial drivers
and, in that process, has suggested using schedule_timeout instead of
mdelay in size_fifo. [2]

It was only at 2.6.7 that Greg Kroah-Hartman added the msleep function
to the core kernel, as people were starting to duplicate it. [3]

However, as size_fifo is called under a spinlock from the autoconfig
function, we might not use msleep here, so removing that comment is the
appropriate thing to do.

[1] http://lkml.iu.edu/hypermail/linux/kernel/9908.3/1229.html
[2] 
https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/?id=33c0d1b0c3ebb61243d9b19ce70d9063acff2aac
[3] 
https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/?id=8365c315507fe10925bb3281d7fe02935b25

Signed-off-by: Thadeu Lima de Souza Cascardo 
Cc: Greg Kroah-Hartman 
Cc: Jiri Slaby 
Cc: linux-ser...@vger.kernel.org
Cc: Russell King 
Cc: Theodore Ts'o 
---
 drivers/tty/serial/8250/8250_port.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tty/serial/8250/8250_port.c 
b/drivers/tty/serial/8250/8250_port.c
index 1731b98d2471..645daf551fdf 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -796,7 +796,7 @@ static int size_fifo(struct uart_8250_port *up)
serial_out(up, UART_LCR, 0x03);
for (count = 0; count < 256; count++)
serial_out(up, UART_TX, count);
-   mdelay(20);/* FIXME - schedule_timeout */
+   mdelay(20);
for (count = 0; (serial_in(up, UART_LSR) & UART_LSR_DR) &&
 (count < 256); count++)
serial_in(up, UART_RX);
-- 
2.11.0

Re: [PATCHv2 1/7] ppc bpf/jit: Disable classic BPF JIT on ppc64le

2016-06-22 Thread Thadeu Lima de Souza Cascardo

On Wed, Jun 22, 2016 at 09:55:01PM +0530, Naveen N. Rao wrote:
> Classic BPF JIT was never ported completely to work on little endian
> powerpc. However, it can be enabled and will crash the system when used.
> As such, disable use of BPF JIT on ppc64le.

Thanks, Naveen.

Acked-by: Thadeu Lima de Souza Cascardo <casca...@redhat.com>

> 
> Cc: sta...@vger.kernel.org
> Cc: Matt Evans <m...@ozlabs.org>
> Cc: Denis Kirjanov <k...@linux-powerpc.org>
> Cc: Michael Ellerman <m...@ellerman.id.au>
> Cc: Paul Mackerras <pau...@samba.org>
> Cc: Alexei Starovoitov <a...@fb.com>
> Cc: Daniel Borkmann <dan...@iogearbox.net>
> Cc: "David S. Miller" <da...@davemloft.net>
> Cc: Ananth N Mavinakayanahalli <ana...@in.ibm.com>
> Cc: Thadeu Lima de Souza Cascardo <casca...@redhat.com>
> Reported-by: Thadeu Lima de Souza Cascardo <casca...@redhat.com>
> Signed-off-by: Naveen N. Rao <naveen.n@linux.vnet.ibm.com>
> ---
>  arch/powerpc/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 01f7464..0a9d439 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -128,7 +128,7 @@ config PPC
>   select IRQ_FORCED_THREADING
>   select HAVE_RCU_TABLE_FREE if SMP
>   select HAVE_SYSCALL_TRACEPOINTS
> - select HAVE_CBPF_JIT
> + select HAVE_CBPF_JIT if CPU_BIG_ENDIAN
>   select HAVE_ARCH_JUMP_LABEL
>   select ARCH_HAVE_NMI_SAFE_CMPXCHG
>   select ARCH_HAS_GCOV_PROFILE_ALL
> -- 
> 2.8.2
>

Re: [PATCHv2 1/7] ppc bpf/jit: Disable classic BPF JIT on ppc64le

2016-06-22 Thread Thadeu Lima de Souza Cascardo

On Wed, Jun 22, 2016 at 09:55:01PM +0530, Naveen N. Rao wrote:
> Classic BPF JIT was never ported completely to work on little endian
> powerpc. However, it can be enabled and will crash the system when used.
> As such, disable use of BPF JIT on ppc64le.

Thanks, Naveen.

Acked-by: Thadeu Lima de Souza Cascardo 

> 
> Cc: sta...@vger.kernel.org
> Cc: Matt Evans 
> Cc: Denis Kirjanov 
> Cc: Michael Ellerman 
> Cc: Paul Mackerras 
> Cc: Alexei Starovoitov 
> Cc: Daniel Borkmann 
> Cc: "David S. Miller" 
> Cc: Ananth N Mavinakayanahalli 
> Cc: Thadeu Lima de Souza Cascardo 
> Reported-by: Thadeu Lima de Souza Cascardo 
> Signed-off-by: Naveen N. Rao 
> ---
>  arch/powerpc/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 01f7464..0a9d439 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -128,7 +128,7 @@ config PPC
>   select IRQ_FORCED_THREADING
>   select HAVE_RCU_TABLE_FREE if SMP
>   select HAVE_SYSCALL_TRACEPOINTS
> - select HAVE_CBPF_JIT
> + select HAVE_CBPF_JIT if CPU_BIG_ENDIAN
>   select HAVE_ARCH_JUMP_LABEL
>   select ARCH_HAVE_NMI_SAFE_CMPXCHG
>   select ARCH_HAS_GCOV_PROFILE_ALL
> -- 
> 2.8.2
>

Re: [PATCH] ppc: Fix BPF JIT for ABIv2

2016-06-21 Thread Thadeu Lima de Souza Cascardo

On Tue, Jun 21, 2016 at 09:15:48PM +1000, Michael Ellerman wrote:
> On Tue, 2016-06-21 at 14:28 +0530, Naveen N. Rao wrote:
> > On 2016/06/20 03:56PM, Thadeu Lima de Souza Cascardo wrote:
> > > On Sun, Jun 19, 2016 at 11:19:14PM +0530, Naveen N. Rao wrote:
> > > > On 2016/06/17 10:00AM, Thadeu Lima de Souza Cascardo wrote:
> > > > > 
> > > > > Hi, Michael and Naveen.
> > > > > 
> > > > > I noticed independently that there is a problem with BPF JIT and 
> > > > > ABIv2, and
> > > > > worked out the patch below before I noticed Naveen's patchset and the 
> > > > > latest
> > > > > changes in ppc tree for a better way to check for ABI versions.
> > > > > 
> > > > > However, since the issue described below affect mainline and stable 
> > > > > kernels,
> > > > > would you consider applying it before merging your two patchsets, so 
> > > > > that we can
> > > > > more easily backport the fix?
> > > > 
> > > > Hi Cascardo,
> > > > Given that this has been broken on ABIv2 since forever, I didn't bother 
> > > > fixing it. But, I can see why this would be a good thing to have for 
> > > > -stable and existing distros. However, while your patch below may fix 
> > > > the crash you're seeing on ppc64le, it is not sufficient -- you'll need 
> > > > changes in bpf_jit_asm.S as well.
> > > 
> > > Hi, Naveen.
> > > 
> > > Any tips on how to exercise possible issues there? Or what changes you 
> > > think
> > > would be sufficient?
> > 
> > The calling convention is different with ABIv2 and so we'll need changes 
> > in bpf_slow_path_common() and sk_negative_common().
> 
> How big would those changes be? Do we know?
> 
> How come no one reported this was broken previously? This is the first I've
> heard of it being broken.
> 

I just heard of it less than two weeks ago, and only could investigate it last
week, when I realized mainline was also affected.

It looks like the little-endian support for classic JIT were done before the
conversion to ABIv2. And as JIT is disabled by default, no one seems to have
exercised it.

> > However, rather than enabling classic JIT for ppc64le, are we better off 
> > just disabling it?
> > 
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -128,7 +128,7 @@ config PPC
> > select IRQ_FORCED_THREADING
> > select HAVE_RCU_TABLE_FREE if SMP
> > select HAVE_SYSCALL_TRACEPOINTS
> > -   select HAVE_CBPF_JIT
> > +   select HAVE_CBPF_JIT if CPU_BIG_ENDIAN
> > select HAVE_ARCH_JUMP_LABEL
> > select ARCH_HAVE_NMI_SAFE_CMPXCHG
> > select ARCH_HAS_GCOV_PROFILE_ALL
> > 
> > 
> > Michael,
> > Let me know your thoughts on whether you intend to take this patch or 
> > Cascardo's patch for -stable before the eBPF patches. I can redo my 
> > patches accordingly.
> 
> This patch sounds like the best option at the moment for something we can
> backport. Unless the changes to fix it are minimal.
> 
> cheers
> 

With my patch only, I can run a minimal tcpdump tcp port 22 with success. It
correctly filter packets. But as pointed out, slow paths may not be taken.

I don't have strong opinions on what to apply to stable, just that it would be
nice to have something for the crash before applying all the nice changes by
Naveen.

Cascardo.

Re: [PATCH] ppc: Fix BPF JIT for ABIv2

2016-06-21 Thread Thadeu Lima de Souza Cascardo

On Tue, Jun 21, 2016 at 09:15:48PM +1000, Michael Ellerman wrote:
> On Tue, 2016-06-21 at 14:28 +0530, Naveen N. Rao wrote:
> > On 2016/06/20 03:56PM, Thadeu Lima de Souza Cascardo wrote:
> > > On Sun, Jun 19, 2016 at 11:19:14PM +0530, Naveen N. Rao wrote:
> > > > On 2016/06/17 10:00AM, Thadeu Lima de Souza Cascardo wrote:
> > > > > 
> > > > > Hi, Michael and Naveen.
> > > > > 
> > > > > I noticed independently that there is a problem with BPF JIT and 
> > > > > ABIv2, and
> > > > > worked out the patch below before I noticed Naveen's patchset and the 
> > > > > latest
> > > > > changes in ppc tree for a better way to check for ABI versions.
> > > > > 
> > > > > However, since the issue described below affect mainline and stable 
> > > > > kernels,
> > > > > would you consider applying it before merging your two patchsets, so 
> > > > > that we can
> > > > > more easily backport the fix?
> > > > 
> > > > Hi Cascardo,
> > > > Given that this has been broken on ABIv2 since forever, I didn't bother 
> > > > fixing it. But, I can see why this would be a good thing to have for 
> > > > -stable and existing distros. However, while your patch below may fix 
> > > > the crash you're seeing on ppc64le, it is not sufficient -- you'll need 
> > > > changes in bpf_jit_asm.S as well.
> > > 
> > > Hi, Naveen.
> > > 
> > > Any tips on how to exercise possible issues there? Or what changes you 
> > > think
> > > would be sufficient?
> > 
> > The calling convention is different with ABIv2 and so we'll need changes 
> > in bpf_slow_path_common() and sk_negative_common().
> 
> How big would those changes be? Do we know?
> 
> How come no one reported this was broken previously? This is the first I've
> heard of it being broken.
> 

I just heard of it less than two weeks ago, and only could investigate it last
week, when I realized mainline was also affected.

It looks like the little-endian support for classic JIT were done before the
conversion to ABIv2. And as JIT is disabled by default, no one seems to have
exercised it.

> > However, rather than enabling classic JIT for ppc64le, are we better off 
> > just disabling it?
> > 
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -128,7 +128,7 @@ config PPC
> > select IRQ_FORCED_THREADING
> > select HAVE_RCU_TABLE_FREE if SMP
> > select HAVE_SYSCALL_TRACEPOINTS
> > -   select HAVE_CBPF_JIT
> > +   select HAVE_CBPF_JIT if CPU_BIG_ENDIAN
> > select HAVE_ARCH_JUMP_LABEL
> > select ARCH_HAVE_NMI_SAFE_CMPXCHG
> > select ARCH_HAS_GCOV_PROFILE_ALL
> > 
> > 
> > Michael,
> > Let me know your thoughts on whether you intend to take this patch or 
> > Cascardo's patch for -stable before the eBPF patches. I can redo my 
> > patches accordingly.
> 
> This patch sounds like the best option at the moment for something we can
> backport. Unless the changes to fix it are minimal.
> 
> cheers
> 

With my patch only, I can run a minimal tcpdump tcp port 22 with success. It
correctly filter packets. But as pointed out, slow paths may not be taken.

I don't have strong opinions on what to apply to stable, just that it would be
nice to have something for the crash before applying all the nice changes by
Naveen.

Cascardo.

Re: [PATCH] ppc: Fix BPF JIT for ABIv2

2016-06-20 Thread Thadeu Lima de Souza Cascardo

On Sun, Jun 19, 2016 at 11:19:14PM +0530, Naveen N. Rao wrote:
> On 2016/06/17 10:00AM, Thadeu Lima de Souza Cascardo wrote:
> > On Fri, Jun 17, 2016 at 10:53:21PM +1000, Michael Ellerman wrote:
> > > On Tue, 2016-07-06 at 13:32:23 UTC, "Naveen N. Rao" wrote:
> > > > diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
> > > > b/arch/powerpc/net/bpf_jit_comp64.c
> > > > new file mode 100644
> > > > index 000..954ff53
> > > > --- /dev/null
> > > > +++ b/arch/powerpc/net/bpf_jit_comp64.c
> > > > @@ -0,0 +1,956 @@
> > > ...
> > > > +
> > > > +static void bpf_jit_fill_ill_insns(void *area, unsigned int size)
> > > > +{
> > > > +   int *p = area;
> > > > +
> > > > +   /* Fill whole space with trap instructions */
> > > > +   while (p < (int *)((char *)area + size))
> > > > +   *p++ = BREAKPOINT_INSTRUCTION;
> > > > +}
> > > 
> > > This breaks the build for some configs, presumably you're missing a 
> > > header:
> > > 
> > >   arch/powerpc/net/bpf_jit_comp64.c:30:10: error: 
> > > 'BREAKPOINT_INSTRUCTION' undeclared (first use in this function)
> > > 
> > > http://kisskb.ellerman.id.au/kisskb/buildresult/12720611/
> > > 
> > > cheers
> > 
> > Hi, Michael and Naveen.
> > 
> > I noticed independently that there is a problem with BPF JIT and ABIv2, and
> > worked out the patch below before I noticed Naveen's patchset and the latest
> > changes in ppc tree for a better way to check for ABI versions.
> > 
> > However, since the issue described below affect mainline and stable kernels,
> > would you consider applying it before merging your two patchsets, so that 
> > we can
> > more easily backport the fix?
> 
> Hi Cascardo,
> Given that this has been broken on ABIv2 since forever, I didn't bother 
> fixing it. But, I can see why this would be a good thing to have for 
> -stable and existing distros. However, while your patch below may fix 
> the crash you're seeing on ppc64le, it is not sufficient -- you'll need 
> changes in bpf_jit_asm.S as well.

Hi, Naveen.

Any tips on how to exercise possible issues there? Or what changes you think
would be sufficient?

I will see what I can find by myself, but would appreciate any help.

Regards.
Cascardo.

> 
> Regards,
> Naveen
>

Re: [PATCH] ppc: Fix BPF JIT for ABIv2

2016-06-20 Thread Thadeu Lima de Souza Cascardo

On Sun, Jun 19, 2016 at 11:19:14PM +0530, Naveen N. Rao wrote:
> On 2016/06/17 10:00AM, Thadeu Lima de Souza Cascardo wrote:
> > On Fri, Jun 17, 2016 at 10:53:21PM +1000, Michael Ellerman wrote:
> > > On Tue, 2016-07-06 at 13:32:23 UTC, "Naveen N. Rao" wrote:
> > > > diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
> > > > b/arch/powerpc/net/bpf_jit_comp64.c
> > > > new file mode 100644
> > > > index 000..954ff53
> > > > --- /dev/null
> > > > +++ b/arch/powerpc/net/bpf_jit_comp64.c
> > > > @@ -0,0 +1,956 @@
> > > ...
> > > > +
> > > > +static void bpf_jit_fill_ill_insns(void *area, unsigned int size)
> > > > +{
> > > > +   int *p = area;
> > > > +
> > > > +   /* Fill whole space with trap instructions */
> > > > +   while (p < (int *)((char *)area + size))
> > > > +   *p++ = BREAKPOINT_INSTRUCTION;
> > > > +}
> > > 
> > > This breaks the build for some configs, presumably you're missing a 
> > > header:
> > > 
> > >   arch/powerpc/net/bpf_jit_comp64.c:30:10: error: 
> > > 'BREAKPOINT_INSTRUCTION' undeclared (first use in this function)
> > > 
> > > http://kisskb.ellerman.id.au/kisskb/buildresult/12720611/
> > > 
> > > cheers
> > 
> > Hi, Michael and Naveen.
> > 
> > I noticed independently that there is a problem with BPF JIT and ABIv2, and
> > worked out the patch below before I noticed Naveen's patchset and the latest
> > changes in ppc tree for a better way to check for ABI versions.
> > 
> > However, since the issue described below affect mainline and stable kernels,
> > would you consider applying it before merging your two patchsets, so that 
> > we can
> > more easily backport the fix?
> 
> Hi Cascardo,
> Given that this has been broken on ABIv2 since forever, I didn't bother 
> fixing it. But, I can see why this would be a good thing to have for 
> -stable and existing distros. However, while your patch below may fix 
> the crash you're seeing on ppc64le, it is not sufficient -- you'll need 
> changes in bpf_jit_asm.S as well.

Hi, Naveen.

Any tips on how to exercise possible issues there? Or what changes you think
would be sufficient?

I will see what I can find by myself, but would appreciate any help.

Regards.
Cascardo.

> 
> Regards,
> Naveen
>

[PATCH] ppc: Fix BPF JIT for ABIv2

2016-06-17 Thread Thadeu Lima de Souza Cascardo

On Fri, Jun 17, 2016 at 10:53:21PM +1000, Michael Ellerman wrote:
> On Tue, 2016-07-06 at 13:32:23 UTC, "Naveen N. Rao" wrote:
> > diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
> > b/arch/powerpc/net/bpf_jit_comp64.c
> > new file mode 100644
> > index 000..954ff53
> > --- /dev/null
> > +++ b/arch/powerpc/net/bpf_jit_comp64.c
> > @@ -0,0 +1,956 @@
> ...
> > +
> > +static void bpf_jit_fill_ill_insns(void *area, unsigned int size)
> > +{
> > +   int *p = area;
> > +
> > +   /* Fill whole space with trap instructions */
> > +   while (p < (int *)((char *)area + size))
> > +   *p++ = BREAKPOINT_INSTRUCTION;
> > +}
> 
> This breaks the build for some configs, presumably you're missing a header:
> 
>   arch/powerpc/net/bpf_jit_comp64.c:30:10: error: 'BREAKPOINT_INSTRUCTION' 
> undeclared (first use in this function)
> 
> http://kisskb.ellerman.id.au/kisskb/buildresult/12720611/
> 
> cheers

Hi, Michael and Naveen.

I noticed independently that there is a problem with BPF JIT and ABIv2, and
worked out the patch below before I noticed Naveen's patchset and the latest
changes in ppc tree for a better way to check for ABI versions.

However, since the issue described below affect mainline and stable kernels,
would you consider applying it before merging your two patchsets, so that we can
more easily backport the fix?

Thanks.
Cascardo.

---
>From a984dc02b6317a1d3a3c2302385adba5227be5bd Mon Sep 17 00:00:00 2001
From: Thadeu Lima de Souza Cascardo <casca...@redhat.com>
Date: Wed, 15 Jun 2016 13:22:12 -0300
Subject: [PATCH] ppc: Fix BPF JIT for ABIv2

ABIv2 used for ppc64le does not use function descriptors. Without this patch,
whenever BPF JIT is enabled, we get a crash as below.

[root@ibm-p8-kvm-05-guest-02 ~]# echo 2 > /proc/sys/net/core/bpf_jit_enable
[root@ibm-p8-kvm-05-guest-02 ~]# tcpdump -n -i eth0 tcp port 22
device eth0 entered promiscuous mode
Pass 1: shrink = 0, seen = 0x0
Pass 2: shrink = 0, seen = 0x0
flen=1 proglen=8 pass=3 image=d5bb9018 from=tcpdump pid=11387
JIT code: : 00 00 60 38 20 00 80 4e
Pass 1: shrink = 0, seen = 0x3
Pass 2: shrink = 0, seen = 0x3
flen=20 proglen=524 pass=3 image=d5bbd018 from=tcpdump pid=11387
JIT code: : a6 02 08 7c 10 00 01 f8 70 ff c1 f9 78 ff e1 f9
JIT code: 0010: e1 fe 21 f8 7c 00 e3 80 78 00 e3 81 50 78 e7 7d
JIT code: 0020: c8 00 c3 e9 00 00 a0 38 00 c0 e0 3c c6 07 e7 78
JIT code: 0030: 08 00 e7 64 54 1b e7 60 a6 03 e8 7c 0c 00 c0 38
JIT code: 0040: 21 00 80 4e b0 01 80 41 00 00 00 60 dd 86 e0 38
JIT code: 0050: 01 00 e7 3c 40 38 04 7c 9c 00 82 40 00 00 00 60
JIT code: 0060: 00 c0 e0 3c c6 07 e7 78 08 00 e7 64 70 1b e7 60
JIT code: 0070: a6 03 e8 7c 14 00 c0 38 21 00 80 4e 78 01 80 41
JIT code: 0080: 00 00 00 60 06 00 04 28 68 01 82 40 00 00 00 60
JIT code: 0090: 00 c0 e0 3c c6 07 e7 78 08 00 e7 64 54 1b e7 60
JIT code: 00a0: a6 03 e8 7c 36 00 c0 38 21 00 80 4e 48 01 80 41
JIT code: 00b0: 00 00 00 60 16 00 04 28 2c 01 82 41 00 00 00 60
JIT code: 00c0: 00 c0 e0 3c c6 07 e7 78 08 00 e7 64 54 1b e7 60
JIT code: 00d0: a6 03 e8 7c 38 00 c0 38 21 00 80 4e 18 01 80 41
JIT code: 00e0: 00 00 00 60 16 00 04 28 fc 00 82 41 00 00 00 60
JIT code: 00f0: 00 01 00 48 00 08 04 28 f8 00 82 40 00 00 00 60
JIT code: 0100: 00 c0 e0 3c c6 07 e7 78 08 00 e7 64 70 1b e7 60
JIT code: 0110: a6 03 e8 7c 17 00 c0 38 21 00 80 4e d8 00 80 41
JIT code: 0120: 00 00 00 60 06 00 04 28 c8 00 82 40 00 00 00 60
JIT code: 0130: 00 c0 e0 3c c6 07 e7 78 08 00 e7 64 54 1b e7 60
JIT code: 0140: a6 03 e8 7c 14 00 c0 38 21 00 80 4e a8 00 80 41
JIT code: 0150: 00 00 00 60 ff 1f 87 70 98 00 82 40 00 00 00 60
JIT code: 0160: 00 c0 e0 3c c6 07 e7 78 08 00 e7 64 88 1b e7 60
JIT code: 0170: a6 03 e8 7c 0e 00 c0 38 21 00 80 4e 78 00 80 41
JIT code: 0180: 00 00 00 60 00 c0 e0 3c c6 07 e7 78 08 00 e7 64
JIT code: 0190: 4c 1b e7 60 a6 03 e8 7c 0e 00 c5 38 21 00 80 4e
JIT code: 01a0: 54 00 80 41 00 00 00 60 16 00 04 28 38 00 82 41
JIT code: 01b0: 00 00 00 60 00 c0 e0 3c c6 07 e7 78 08 00 e7 64
JIT code: 01c0: 4c 1b e7 60 a6 03 e8 7c 10 00 c5 38 21 00 80 4e
JIT code: 01d0: 24 00 80 41 00 00 00 60 16 00 04 28 14 00 82 40
JIT code: 01e0: 00 00 00 60 ff ff 60 38 01 00 63 3c 08 00 00 48
JIT code: 01f0: 00 00 60 38 20 01 21 38 10 00 01 e8 a6 03 08 7c
JIT code: 0200: 70 ff c1 e9 78 ff e1 e9 20 00 80 4e
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
Oops: Exception in kernel mode, sig: 4 [#1]
SMP NR_CPUS=32 NUMA pSeries
Modules linked in: virtio_balloon nfsd ip_tables x_tables autofs4 xfs libcrc32c 
virtio_console virtio_net virtio_pci virtio_ring virtio
CPU: 1 PID: 0 Comm: swapper/1 No

[PATCH] ppc: Fix BPF JIT for ABIv2

2016-06-17 Thread Thadeu Lima de Souza Cascardo

On Fri, Jun 17, 2016 at 10:53:21PM +1000, Michael Ellerman wrote:
> On Tue, 2016-07-06 at 13:32:23 UTC, "Naveen N. Rao" wrote:
> > diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
> > b/arch/powerpc/net/bpf_jit_comp64.c
> > new file mode 100644
> > index 000..954ff53
> > --- /dev/null
> > +++ b/arch/powerpc/net/bpf_jit_comp64.c
> > @@ -0,0 +1,956 @@
> ...
> > +
> > +static void bpf_jit_fill_ill_insns(void *area, unsigned int size)
> > +{
> > +   int *p = area;
> > +
> > +   /* Fill whole space with trap instructions */
> > +   while (p < (int *)((char *)area + size))
> > +   *p++ = BREAKPOINT_INSTRUCTION;
> > +}
> 
> This breaks the build for some configs, presumably you're missing a header:
> 
>   arch/powerpc/net/bpf_jit_comp64.c:30:10: error: 'BREAKPOINT_INSTRUCTION' 
> undeclared (first use in this function)
> 
> http://kisskb.ellerman.id.au/kisskb/buildresult/12720611/
> 
> cheers

Hi, Michael and Naveen.

I noticed independently that there is a problem with BPF JIT and ABIv2, and
worked out the patch below before I noticed Naveen's patchset and the latest
changes in ppc tree for a better way to check for ABI versions.

However, since the issue described below affect mainline and stable kernels,
would you consider applying it before merging your two patchsets, so that we can
more easily backport the fix?

Thanks.
Cascardo.

---
>From a984dc02b6317a1d3a3c2302385adba5227be5bd Mon Sep 17 00:00:00 2001
From: Thadeu Lima de Souza Cascardo 
Date: Wed, 15 Jun 2016 13:22:12 -0300
Subject: [PATCH] ppc: Fix BPF JIT for ABIv2

ABIv2 used for ppc64le does not use function descriptors. Without this patch,
whenever BPF JIT is enabled, we get a crash as below.

[root@ibm-p8-kvm-05-guest-02 ~]# echo 2 > /proc/sys/net/core/bpf_jit_enable
[root@ibm-p8-kvm-05-guest-02 ~]# tcpdump -n -i eth0 tcp port 22
device eth0 entered promiscuous mode
Pass 1: shrink = 0, seen = 0x0
Pass 2: shrink = 0, seen = 0x0
flen=1 proglen=8 pass=3 image=d5bb9018 from=tcpdump pid=11387
JIT code: : 00 00 60 38 20 00 80 4e
Pass 1: shrink = 0, seen = 0x3
Pass 2: shrink = 0, seen = 0x3
flen=20 proglen=524 pass=3 image=d5bbd018 from=tcpdump pid=11387
JIT code: : a6 02 08 7c 10 00 01 f8 70 ff c1 f9 78 ff e1 f9
JIT code: 0010: e1 fe 21 f8 7c 00 e3 80 78 00 e3 81 50 78 e7 7d
JIT code: 0020: c8 00 c3 e9 00 00 a0 38 00 c0 e0 3c c6 07 e7 78
JIT code: 0030: 08 00 e7 64 54 1b e7 60 a6 03 e8 7c 0c 00 c0 38
JIT code: 0040: 21 00 80 4e b0 01 80 41 00 00 00 60 dd 86 e0 38
JIT code: 0050: 01 00 e7 3c 40 38 04 7c 9c 00 82 40 00 00 00 60
JIT code: 0060: 00 c0 e0 3c c6 07 e7 78 08 00 e7 64 70 1b e7 60
JIT code: 0070: a6 03 e8 7c 14 00 c0 38 21 00 80 4e 78 01 80 41
JIT code: 0080: 00 00 00 60 06 00 04 28 68 01 82 40 00 00 00 60
JIT code: 0090: 00 c0 e0 3c c6 07 e7 78 08 00 e7 64 54 1b e7 60
JIT code: 00a0: a6 03 e8 7c 36 00 c0 38 21 00 80 4e 48 01 80 41
JIT code: 00b0: 00 00 00 60 16 00 04 28 2c 01 82 41 00 00 00 60
JIT code: 00c0: 00 c0 e0 3c c6 07 e7 78 08 00 e7 64 54 1b e7 60
JIT code: 00d0: a6 03 e8 7c 38 00 c0 38 21 00 80 4e 18 01 80 41
JIT code: 00e0: 00 00 00 60 16 00 04 28 fc 00 82 41 00 00 00 60
JIT code: 00f0: 00 01 00 48 00 08 04 28 f8 00 82 40 00 00 00 60
JIT code: 0100: 00 c0 e0 3c c6 07 e7 78 08 00 e7 64 70 1b e7 60
JIT code: 0110: a6 03 e8 7c 17 00 c0 38 21 00 80 4e d8 00 80 41
JIT code: 0120: 00 00 00 60 06 00 04 28 c8 00 82 40 00 00 00 60
JIT code: 0130: 00 c0 e0 3c c6 07 e7 78 08 00 e7 64 54 1b e7 60
JIT code: 0140: a6 03 e8 7c 14 00 c0 38 21 00 80 4e a8 00 80 41
JIT code: 0150: 00 00 00 60 ff 1f 87 70 98 00 82 40 00 00 00 60
JIT code: 0160: 00 c0 e0 3c c6 07 e7 78 08 00 e7 64 88 1b e7 60
JIT code: 0170: a6 03 e8 7c 0e 00 c0 38 21 00 80 4e 78 00 80 41
JIT code: 0180: 00 00 00 60 00 c0 e0 3c c6 07 e7 78 08 00 e7 64
JIT code: 0190: 4c 1b e7 60 a6 03 e8 7c 0e 00 c5 38 21 00 80 4e
JIT code: 01a0: 54 00 80 41 00 00 00 60 16 00 04 28 38 00 82 41
JIT code: 01b0: 00 00 00 60 00 c0 e0 3c c6 07 e7 78 08 00 e7 64
JIT code: 01c0: 4c 1b e7 60 a6 03 e8 7c 10 00 c5 38 21 00 80 4e
JIT code: 01d0: 24 00 80 41 00 00 00 60 16 00 04 28 14 00 82 40
JIT code: 01e0: 00 00 00 60 ff ff 60 38 01 00 63 3c 08 00 00 48
JIT code: 01f0: 00 00 60 38 20 01 21 38 10 00 01 e8 a6 03 08 7c
JIT code: 0200: 70 ff c1 e9 78 ff e1 e9 20 00 80 4e
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
Oops: Exception in kernel mode, sig: 4 [#1]
SMP NR_CPUS=32 NUMA pSeries
Modules linked in: virtio_balloon nfsd ip_tables x_tables autofs4 xfs libcrc32c 
virtio_console virtio_net virtio_pci virtio_ring virtio
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc3-9-gdb

Re: [PATCH 1/1] net: ehea: avoid null pointer dereference

2016-05-17 Thread Thadeu Lima de Souza Cascardo

On Tue, May 17, 2016 at 10:28:54PM +0200, Heinrich Schuchardt wrote:
> ehea_get_port may return NULL. Do not dereference NULL value.
> 
> Fixes: 8c4877a4128e ("ehea: Use the standard logging functions")
> Signed-off-by: Heinrich Schuchardt <xypron.g...@gmx.de>

Acked-by: Thadeu Lima de Souza Cascardo <casca...@debian.org>

> ---
>  drivers/net/ethernet/ibm/ehea/ehea_main.c | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/ibm/ehea/ehea_main.c 
> b/drivers/net/ethernet/ibm/ehea/ehea_main.c
> index 2a0dc12..54efa9a 100644
> --- a/drivers/net/ethernet/ibm/ehea/ehea_main.c
> +++ b/drivers/net/ethernet/ibm/ehea/ehea_main.c
> @@ -1169,16 +1169,15 @@ static void ehea_parse_eqe(struct ehea_adapter 
> *adapter, u64 eqe)
>   ec = EHEA_BMASK_GET(NEQE_EVENT_CODE, eqe);
>   portnum = EHEA_BMASK_GET(NEQE_PORTNUM, eqe);
>   port = ehea_get_port(adapter, portnum);
> + if (!port) {
> + netdev_err(NULL, "unknown portnum %x\n", portnum);
> + return;
> + }
>   dev = port->netdev;
>  
>   switch (ec) {
>   case EHEA_EC_PORTSTATE_CHG: /* port state change */
>  
> - if (!port) {
> - netdev_err(dev, "unknown portnum %x\n", portnum);
> - break;
> - }
> -
>   if (EHEA_BMASK_GET(NEQE_PORT_UP, eqe)) {
>   if (!netif_carrier_ok(dev)) {
>   ret = ehea_sense_port_attr(port);
> -- 
> 2.1.4
>

Re: [PATCH 1/1] net: ehea: avoid null pointer dereference

2016-05-17 Thread Thadeu Lima de Souza Cascardo

On Tue, May 17, 2016 at 10:28:54PM +0200, Heinrich Schuchardt wrote:
> ehea_get_port may return NULL. Do not dereference NULL value.
> 
> Fixes: 8c4877a4128e ("ehea: Use the standard logging functions")
> Signed-off-by: Heinrich Schuchardt 

Acked-by: Thadeu Lima de Souza Cascardo 

> ---
>  drivers/net/ethernet/ibm/ehea/ehea_main.c | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/ibm/ehea/ehea_main.c 
> b/drivers/net/ethernet/ibm/ehea/ehea_main.c
> index 2a0dc12..54efa9a 100644
> --- a/drivers/net/ethernet/ibm/ehea/ehea_main.c
> +++ b/drivers/net/ethernet/ibm/ehea/ehea_main.c
> @@ -1169,16 +1169,15 @@ static void ehea_parse_eqe(struct ehea_adapter 
> *adapter, u64 eqe)
>   ec = EHEA_BMASK_GET(NEQE_EVENT_CODE, eqe);
>   portnum = EHEA_BMASK_GET(NEQE_PORTNUM, eqe);
>   port = ehea_get_port(adapter, portnum);
> + if (!port) {
> + netdev_err(NULL, "unknown portnum %x\n", portnum);
> + return;
> + }
>   dev = port->netdev;
>  
>   switch (ec) {
>   case EHEA_EC_PORTSTATE_CHG: /* port state change */
>  
> - if (!port) {
> - netdev_err(dev, "unknown portnum %x\n", portnum);
> - break;
> - }
> -
>   if (EHEA_BMASK_GET(NEQE_PORT_UP, eqe)) {
>   if (!netif_carrier_ok(dev)) {
>   ret = ehea_sense_port_attr(port);
> -- 
> 2.1.4
>

Re: [PATCH] ehea: Drop owner assignment from platform_driver

2016-02-19 Thread Thadeu Lima de Souza Cascardo

On Fri, Feb 19, 2016 at 07:06:46AM -0500, Julia Lawall wrote:
> On Fri, 19 Feb 2016, Thadeu Lima de Souza Cascardo wrote:
> > On Fri, Feb 19, 2016 at 04:52:19PM +0530, Amitoj Kaur Chawla wrote:
> > > platform_driver does not need to set an owner, it will be populated by
> > > the driver core.
> > >
> > > Generated-by: scripts/coccinelle/api/platform_no_drv_owner.cocci
> > >
> > > Signed-off-by: Amitoj Kaur Chawla <amitoj1...@gmail.com>
> > > ---
> > >  drivers/net/ethernet/ibm/ehea/ehea_main.c | 1 -
> > >  1 file changed, 1 deletion(-)
> > >
> > > diff --git a/drivers/net/ethernet/ibm/ehea/ehea_main.c 
> > > b/drivers/net/ethernet/ibm/ehea/ehea_main.c
> > > index 2a0dc12..d4b022f 100644
> > > --- a/drivers/net/ethernet/ibm/ehea/ehea_main.c
> > > +++ b/drivers/net/ethernet/ibm/ehea/ehea_main.c
> > > @@ -127,7 +127,6 @@ static const struct of_device_id ehea_device_table[] 
> > > = {
> > >  static struct platform_driver ehea_driver = {
> > >   .driver = {
> > >   .name = "ehea",
> > > - .owner = THIS_MODULE,
> > >   .of_match_table = ehea_device_table,
> > >   },
> > >   .probe = ehea_probe_adapter,
> > > --
> > > 1.9.1
> > >
> >
> > NACK.
> >
> > ehea does not use platform_driver_register, it uses
> > ibmebus_register_driver, which does not set owner.
> 
> Thanks for the information.  I will try to update the saemantic patch.
> 
> julia

Thanks, Julia. I appreciate your work on that.

Cascardo.

Re: [PATCH] ehea: Drop owner assignment from platform_driver

2016-02-19 Thread Thadeu Lima de Souza Cascardo

On Fri, Feb 19, 2016 at 07:06:46AM -0500, Julia Lawall wrote:
> On Fri, 19 Feb 2016, Thadeu Lima de Souza Cascardo wrote:
> > On Fri, Feb 19, 2016 at 04:52:19PM +0530, Amitoj Kaur Chawla wrote:
> > > platform_driver does not need to set an owner, it will be populated by
> > > the driver core.
> > >
> > > Generated-by: scripts/coccinelle/api/platform_no_drv_owner.cocci
> > >
> > > Signed-off-by: Amitoj Kaur Chawla 
> > > ---
> > >  drivers/net/ethernet/ibm/ehea/ehea_main.c | 1 -
> > >  1 file changed, 1 deletion(-)
> > >
> > > diff --git a/drivers/net/ethernet/ibm/ehea/ehea_main.c 
> > > b/drivers/net/ethernet/ibm/ehea/ehea_main.c
> > > index 2a0dc12..d4b022f 100644
> > > --- a/drivers/net/ethernet/ibm/ehea/ehea_main.c
> > > +++ b/drivers/net/ethernet/ibm/ehea/ehea_main.c
> > > @@ -127,7 +127,6 @@ static const struct of_device_id ehea_device_table[] 
> > > = {
> > >  static struct platform_driver ehea_driver = {
> > >   .driver = {
> > >   .name = "ehea",
> > > - .owner = THIS_MODULE,
> > >   .of_match_table = ehea_device_table,
> > >   },
> > >   .probe = ehea_probe_adapter,
> > > --
> > > 1.9.1
> > >
> >
> > NACK.
> >
> > ehea does not use platform_driver_register, it uses
> > ibmebus_register_driver, which does not set owner.
> 
> Thanks for the information.  I will try to update the saemantic patch.
> 
> julia

Thanks, Julia. I appreciate your work on that.

Cascardo.

Re: [PATCH] ehea: Drop owner assignment from platform_driver

2016-02-19 Thread Thadeu Lima de Souza Cascardo

On Fri, Feb 19, 2016 at 04:52:19PM +0530, Amitoj Kaur Chawla wrote:
> platform_driver does not need to set an owner, it will be populated by
> the driver core.
> 
> Generated-by: scripts/coccinelle/api/platform_no_drv_owner.cocci
> 
> Signed-off-by: Amitoj Kaur Chawla 
> ---
>  drivers/net/ethernet/ibm/ehea/ehea_main.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/ibm/ehea/ehea_main.c 
> b/drivers/net/ethernet/ibm/ehea/ehea_main.c
> index 2a0dc12..d4b022f 100644
> --- a/drivers/net/ethernet/ibm/ehea/ehea_main.c
> +++ b/drivers/net/ethernet/ibm/ehea/ehea_main.c
> @@ -127,7 +127,6 @@ static const struct of_device_id ehea_device_table[] = {
>  static struct platform_driver ehea_driver = {
>   .driver = {
>   .name = "ehea",
> - .owner = THIS_MODULE,
>   .of_match_table = ehea_device_table,
>   },
>   .probe = ehea_probe_adapter,
> -- 
> 1.9.1
> 

NACK.

ehea does not use platform_driver_register, it uses
ibmebus_register_driver, which does not set owner.

Cascardo.

Re: [PATCH] ehea: Drop owner assignment from platform_driver

2016-02-19 Thread Thadeu Lima de Souza Cascardo

On Fri, Feb 19, 2016 at 04:52:19PM +0530, Amitoj Kaur Chawla wrote:
> platform_driver does not need to set an owner, it will be populated by
> the driver core.
> 
> Generated-by: scripts/coccinelle/api/platform_no_drv_owner.cocci
> 
> Signed-off-by: Amitoj Kaur Chawla 
> ---
>  drivers/net/ethernet/ibm/ehea/ehea_main.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/ibm/ehea/ehea_main.c 
> b/drivers/net/ethernet/ibm/ehea/ehea_main.c
> index 2a0dc12..d4b022f 100644
> --- a/drivers/net/ethernet/ibm/ehea/ehea_main.c
> +++ b/drivers/net/ethernet/ibm/ehea/ehea_main.c
> @@ -127,7 +127,6 @@ static const struct of_device_id ehea_device_table[] = {
>  static struct platform_driver ehea_driver = {
>   .driver = {
>   .name = "ehea",
> - .owner = THIS_MODULE,
>   .of_match_table = ehea_device_table,
>   },
>   .probe = ehea_probe_adapter,
> -- 
> 1.9.1
> 

NACK.

ehea does not use platform_driver_register, it uses
ibmebus_register_driver, which does not set owner.

Cascardo.

Re: [BUG] skb corruption and kernel panic at forwarding with fragmentation

2016-01-06 Thread Thadeu Lima de Souza Cascardo

On Wed, Jan 06, 2016 at 11:11:41PM +0300, Konstantin Khlebnikov wrote:
> On Wed, Jan 6, 2016 at 10:59 PM, Cong Wang  wrote:
> > On Wed, Jan 6, 2016 at 11:15 AM, Konstantin Khlebnikov  
> > wrote:
> >> Looks like this happens because ip_options_fragment() relies on
> >> correct ip options length in ip control block in skb. But in
> >> ip_finish_output_gso() control block in segments is reused by
> >> skb_gso_segment(). following ip_fragment() sees some garbage.
> >>
> >> In my case there was no ip options but length becomes non-zero and
> >> ip_options_fragment() picked some bytes from payload and decides to
> >> fill huge range with IPOPT_NOOP (1). One of that ones flipped nr_frags
> >> in skb_shared_info at the end of data =)
> >>
> >
> > Hmm, it looks like SKB_GSO_CB should be cleared after skb_gso_segment()
> > since all the gso information should be saved in shared_info after it 
> > finishes.
> >
> > Does a memset(0) on SKB_GSO_CB after skb_gso_segment() work as well?
> 
> This will break present logic around ip_options_fragment() - it clears
> options from
> second and following fragments. With zeroed cb it will do nothing.
> 
> ip_options_fragment() can get required information directly from ip header but
> it also resets fields in IPCB -- probably it should stay valid here
> and somebody else will use it later.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

I have hit this as well, this fixes it for me on an older kernel. Can you try it
on latest kernel?

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index d8a1745..f44bc91 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -216,6 +216,7 @@ static int ip_finish_output_gso(struct sk_buff *skb)
netdev_features_t features;
struct sk_buff *segs;
int ret = 0;
+   struct inet_skb_parm ipcb;
 
if (skb_gso_network_seglen(skb) <= ip_skb_dst_mtu(skb))
return ip_finish_output2(skb);
@@ -227,6 +228,10 @@ static int ip_finish_output_gso(struct sk_buff *skb)
 * 2) skb arrived via virtio-net, we thus get TSO/GSO skbs directly
 * from host network stack.
 */
+   /* We need to save IPCB here because skb_gso_segment will use
+* SKB_GSO_CB.
+*/
+   ipcb = *IPCB(skb);
features = netif_skb_features(skb);
segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
if (IS_ERR_OR_NULL(segs)) {
@@ -241,6 +246,7 @@ static int ip_finish_output_gso(struct sk_buff *skb)
int err;
 
segs->next = NULL;
+   *IPCB(segs) = ipcb;
err = ip_fragment(segs, ip_finish_output2);
 
if (err && ret == 0)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] skb corruption and kernel panic at forwarding with fragmentation

2016-01-06 Thread Thadeu Lima de Souza Cascardo

On Wed, Jan 06, 2016 at 11:11:41PM +0300, Konstantin Khlebnikov wrote:
> On Wed, Jan 6, 2016 at 10:59 PM, Cong Wang  wrote:
> > On Wed, Jan 6, 2016 at 11:15 AM, Konstantin Khlebnikov  
> > wrote:
> >> Looks like this happens because ip_options_fragment() relies on
> >> correct ip options length in ip control block in skb. But in
> >> ip_finish_output_gso() control block in segments is reused by
> >> skb_gso_segment(). following ip_fragment() sees some garbage.
> >>
> >> In my case there was no ip options but length becomes non-zero and
> >> ip_options_fragment() picked some bytes from payload and decides to
> >> fill huge range with IPOPT_NOOP (1). One of that ones flipped nr_frags
> >> in skb_shared_info at the end of data =)
> >>
> >
> > Hmm, it looks like SKB_GSO_CB should be cleared after skb_gso_segment()
> > since all the gso information should be saved in shared_info after it 
> > finishes.
> >
> > Does a memset(0) on SKB_GSO_CB after skb_gso_segment() work as well?
> 
> This will break present logic around ip_options_fragment() - it clears
> options from
> second and following fragments. With zeroed cb it will do nothing.
> 
> ip_options_fragment() can get required information directly from ip header but
> it also resets fields in IPCB -- probably it should stay valid here
> and somebody else will use it later.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

I have hit this as well, this fixes it for me on an older kernel. Can you try it
on latest kernel?

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index d8a1745..f44bc91 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -216,6 +216,7 @@ static int ip_finish_output_gso(struct sk_buff *skb)
netdev_features_t features;
struct sk_buff *segs;
int ret = 0;
+   struct inet_skb_parm ipcb;
 
if (skb_gso_network_seglen(skb) <= ip_skb_dst_mtu(skb))
return ip_finish_output2(skb);
@@ -227,6 +228,10 @@ static int ip_finish_output_gso(struct sk_buff *skb)
 * 2) skb arrived via virtio-net, we thus get TSO/GSO skbs directly
 * from host network stack.
 */
+   /* We need to save IPCB here because skb_gso_segment will use
+* SKB_GSO_CB.
+*/
+   ipcb = *IPCB(skb);
features = netif_skb_features(skb);
segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
if (IS_ERR_OR_NULL(segs)) {
@@ -241,6 +246,7 @@ static int ip_finish_output_gso(struct sk_buff *skb)
int err;
 
segs->next = NULL;
+   *IPCB(segs) = ipcb;
err = ip_fragment(segs, ip_finish_output2);
 
if (err && ret == 0)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] IGMP: Inhibit reports for local multicast groups

2015-08-13 Thread Thadeu Lima de Souza Cascardo

On Thu, Aug 13, 2015 at 07:01:37PM +0200, Andrew Lunn wrote:
> On Thu, Aug 13, 2015 at 04:52:32PM +, Philip Downey wrote:
> > Hi Andrew
> > IGMP snooping is designed to prevent hosts on a local network from 
> > receiving traffic for a multicast group they have not explicitly joined.   
> > Link-Local multicast traffic should not have an IGMP client since it is 
> > reserved for routing protocols.  One would expect that IGMP snooping needs 
> > to ignore local multicast traffic in the reserved range intended for 
> > routers since there should be no IGMP client to make "join" requests.
> 
> The point of this patch is that Linux is sending out group membership
> for these addresses, it is acting as a client. What happens with a
> switch which is applying IGMP snooping to link-local multicast groups?
> You turn on this feature, and you no longer get your routing protocol
> messages.
> 
> I had a quick look at RFC 3376. The only mention i spotted for not
> sending IGMP messages is:
> 
>The all-systems multicast address, 224.0.0.1, is handled as a special
>case.  On all systems -- that is all hosts and routers, including
>multicast routers -- reception of packets destined to the all-systems
>multicast address, from all sources, is permanently enabled on all
>interfaces on which multicast reception is supported.  No IGMP
>messages are ever sent regarding the all-systems multicast address.
> 
> IGMP v2 has something similar:
> 
>The all-systems group (address 224.0.0.1) is handled as a special
>case.  The host starts in Idle Member state for that group on every
>interface, never transitions to another state, and never sends a
>report for that group.
> 
> But i did not find anything which says all other link-local addresses
> don't need member reports. Did i miss something?
> 
>   Andrew

>From RFC 4541 (Considerations for Internet Group Management Protocol (IGMP) and
Multicast Listener Discovery (MLD) Snooping Switches):

 2) Packets with a destination IP (DIP) address in the 224.0.0.X range
  which are not IGMP must be forwarded on all ports.

  This recommendation is based on the fact that many host systems do
  not send Join IP multicast addresses in this range before sending
  or listening to IP multicast packets.  Furthermore, since the
  224.0.0.X address range is defined as link-local (not to be
  routed), it seems unnecessary to keep the state for each address
  in this range.  Additionally, some routers operate in the
  224.0.0.X address range without issuing IGMP Joins, and these
  applications would break if the switch were to prune them due to
  not having seen a Join Group message from the router.

So, it looks like some hosts and routers out there in the field do not send
joins for those local addresses. In fact, IPv4 local multicast addresses are
ignored when Linux bridge multicast snooping adds a new group.

static int br_ip4_multicast_add_group(struct net_bridge *br,
...
if (ipv4_is_local_multicast(group))
return 0;

Cascardo.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] IGMP: Inhibit reports for local multicast groups

2015-08-13 Thread Thadeu Lima de Souza Cascardo

On Thu, Aug 13, 2015 at 07:01:37PM +0200, Andrew Lunn wrote:
 On Thu, Aug 13, 2015 at 04:52:32PM +, Philip Downey wrote:
  Hi Andrew
  IGMP snooping is designed to prevent hosts on a local network from 
  receiving traffic for a multicast group they have not explicitly joined.   
  Link-Local multicast traffic should not have an IGMP client since it is 
  reserved for routing protocols.  One would expect that IGMP snooping needs 
  to ignore local multicast traffic in the reserved range intended for 
  routers since there should be no IGMP client to make join requests.
 
 The point of this patch is that Linux is sending out group membership
 for these addresses, it is acting as a client. What happens with a
 switch which is applying IGMP snooping to link-local multicast groups?
 You turn on this feature, and you no longer get your routing protocol
 messages.
 
 I had a quick look at RFC 3376. The only mention i spotted for not
 sending IGMP messages is:
 
The all-systems multicast address, 224.0.0.1, is handled as a special
case.  On all systems -- that is all hosts and routers, including
multicast routers -- reception of packets destined to the all-systems
multicast address, from all sources, is permanently enabled on all
interfaces on which multicast reception is supported.  No IGMP
messages are ever sent regarding the all-systems multicast address.
 
 IGMP v2 has something similar:
 
The all-systems group (address 224.0.0.1) is handled as a special
case.  The host starts in Idle Member state for that group on every
interface, never transitions to another state, and never sends a
report for that group.
 
 But i did not find anything which says all other link-local addresses
 don't need member reports. Did i miss something?
 
   Andrew

From RFC 4541 (Considerations for Internet Group Management Protocol (IGMP) and
Multicast Listener Discovery (MLD) Snooping Switches):

 2) Packets with a destination IP (DIP) address in the 224.0.0.X range
  which are not IGMP must be forwarded on all ports.

  This recommendation is based on the fact that many host systems do
  not send Join IP multicast addresses in this range before sending
  or listening to IP multicast packets.  Furthermore, since the
  224.0.0.X address range is defined as link-local (not to be
  routed), it seems unnecessary to keep the state for each address
  in this range.  Additionally, some routers operate in the
  224.0.0.X address range without issuing IGMP Joins, and these
  applications would break if the switch were to prune them due to
  not having seen a Join Group message from the router.

So, it looks like some hosts and routers out there in the field do not send
joins for those local addresses. In fact, IPv4 local multicast addresses are
ignored when Linux bridge multicast snooping adds a new group.

static int br_ip4_multicast_add_group(struct net_bridge *br,
...
if (ipv4_is_local_multicast(group))
return 0;

Cascardo.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 >

1 - 100 of 216 matches

Mail list logo