Re: [PATCH] net: hns: use put_device() if device_register fail

2018-03-12 Thread Richard Weinberger
On Mon, Mar 12, 2018 at 5:27 PM, arvindY  wrote:
>
>
> On Monday 12 March 2018 08:13 PM, David Miller wrote:
>>
>> From: Arvind Yadav 
>> Date: Fri,  9 Mar 2018 16:11:17 +0530
>>
>>> if device_register() returned an error! Always use put_device()
>>> to give up the reference initialized.
>>>
>>> Signed-off-by: Arvind Yadav 
>>
>> I do not see anything giving cls_dev an initial non-zero reference
>> count before this device_register() call.
>
> Yes,  you are correct there is nothing to release (hnae_release).

Wait, this is not what DaveM said.
Since you sent also patches to MTD I care about this too.

Do we have to call put_device() in any case after a failure of
device_register() or not?
In this case the release function is empty, so nothing is going to released.
But technically a put_device() is needed and correct according to your
change log.

I have to admit I don't know all details of the driver core, maybe you
can help me.
device_register() calls device_initialize() followed by device_add().
device_initialize() does a kobject_init() which again sets the
reference counter to 1 via kref_init().
In the next step device_add() does a get_device() --> reference
counter goes up to 2.
But in the exit path of the function a put_device() is done, also in
case of an error.
So device_register() always returns with reference count 1.

If I understand this correctly a put_device() is mandatory.
Also in this driver.
Can you please enlighten me? :-)

-- 
Thanks,
//richard


Re: [PATCH 5/5] mtd: Stop updating erase_info->state and calling mtd_erase_callback()

2018-02-12 Thread Richard Weinberger
Am Montag, 12. Februar 2018, 22:03:11 CET schrieb Boris Brezillon:
> MTD users are no longer checking erase_info->state to determine if the
> erase operation failed or succeeded. Moreover, mtd_erase_callback() is
> now a NOP.
> 
> We can safely get rid of all mtd_erase_callback() calls and all
> erase_info->state assignments. While at it, get rid of the
> erase_info->state field, all MTD_ERASE_XXX definitions and the
> mtd_erase_callback() function.
> 
> Signed-off-by: Boris Brezillon <boris.brezil...@bootlin.com>

Reviewed-by: Richard Weinberger <rich...@nod.at>

Thanks,
//richard



Re: [PATCH 4/5] mtd: Unconditionally update ->fail_addr and ->addr in part_erase()

2018-02-12 Thread Richard Weinberger
Am Montag, 12. Februar 2018, 22:03:10 CET schrieb Boris Brezillon:
> ->fail_addr and ->addr can be updated no matter the result of
> parent->_erase(), we just need to remove the code doing the same thing
> in mtd_erase_callback() to avoid adjusting those fields twice.
> 
> Note that this can be done because all MTD users have been converted to
> not pass an erase_info->callback() and are thus only taking the
> ->addr_fail and ->addr fields into account after part_erase() has
> returned.
> 
> While we're at it, get rid of the erase_info->mtd field which was only
> needed to let mtd_erase_callback() get the partition device back.
> 
> Signed-off-by: Boris Brezillon <boris.brezil...@bootlin.com>

Reviewed-by: Richard Weinberger <rich...@nod.at>

Thanks,
//richard


Re: [PATCH 3/5] mtd: Stop assuming mtd_erase() is asynchronous

2018-02-12 Thread Richard Weinberger
Am Montag, 12. Februar 2018, 22:03:09 CET schrieb Boris Brezillon:
> None of the mtd->_erase() implementations work in an asynchronous manner,
> so let's simplify MTD users that call mtd_erase(). All they need to do
> is check the value returned by mtd_erase() and assume that != 0 means
> failure.
> 
> Signed-off-by: Boris Brezillon <boris.brezil...@bootlin.com>

Reviewed-by: Richard Weinberger <rich...@nod.at>

Thanks,
//richard




Re: [PATCH 1/5] mtd: Initialize ->fail_addr early in mtd_erase()

2018-02-12 Thread Richard Weinberger
Am Montag, 12. Februar 2018, 22:03:07 CET schrieb Boris Brezillon:
> mtd_erase() can return an error before ->fail_addr is initialized to
> MTD_FAIL_ADDR_UNKNOWN. Move this initialization at the very beginning
> of the function.
> 
> Signed-off-by: Boris Brezillon <boris.brezil...@bootlin.com>
> ---
>  drivers/mtd/mtdcore.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
> index a1c94526fb88..c87859ff338b 100644
> --- a/drivers/mtd/mtdcore.c
> +++ b/drivers/mtd/mtdcore.c
> @@ -953,6 +953,8 @@ EXPORT_SYMBOL_GPL(__put_mtd_device);
>   */
>  int mtd_erase(struct mtd_info *mtd, struct erase_info *instr)
>  {
> + instr->fail_addr = MTD_FAIL_ADDR_UNKNOWN;
> +
>   if (!mtd->erasesize || !mtd->_erase)
>   return -ENOTSUPP;
> 
> @@ -961,7 +963,6 @@ int mtd_erase(struct mtd_info *mtd, struct erase_info
> *instr) if (!(mtd->flags & MTD_WRITEABLE))
>   return -EROFS;
> 
> - instr->fail_addr = MTD_FAIL_ADDR_UNKNOWN;
>   if (!instr->len) {
>   instr->state = MTD_ERASE_DONE;
>   mtd_erase_callback(instr);

Reviewed-by: Richard Weinberger <rich...@nod.at>

Thanks,
//richard


Re: [PATCH 2/5] mtd: Get rid of unused fields in struct erase_info

2018-02-12 Thread Richard Weinberger
Am Montag, 12. Februar 2018, 22:03:08 CET schrieb Boris Brezillon:
> Some fields are not used by MTD drivers, users or core code. Moreover,
> those fields are not documented, so get rid of them to avoid any
> confusion.
> 
> Signed-off-by: Boris Brezillon <boris.brezil...@bootlin.com>
> ---
>  include/linux/mtd/mtd.h | 5 -
>  1 file changed, 5 deletions(-)
> 
> diff --git a/include/linux/mtd/mtd.h b/include/linux/mtd/mtd.h
> index 205ededccc60..2a407dc9beaa 100644
> --- a/include/linux/mtd/mtd.h
> +++ b/include/linux/mtd/mtd.h
> @@ -48,14 +48,9 @@ struct erase_info {
>   uint64_t addr;
>   uint64_t len;
>   uint64_t fail_addr;
> - u_long time;
> - u_long retries;
> - unsigned dev;
> - unsigned cell;
>   void (*callback) (struct erase_info *self);
>   u_long priv;
>   u_char state;
> - struct erase_info *next;
>  };
> 
>  struct mtd_erase_region_info {

Reviewed-by: Richard Weinberger <rich...@nod.at>

Thanks,
//richard



Re: [PATCH bpf 2/3] bpf: fix build issues on um due to mising bpf_perf_event.h

2017-12-11 Thread Richard Weinberger
Am Dienstag, 12. Dezember 2017, 02:25:31 CET schrieb Daniel Borkmann:
> Since c895f6f703ad ("bpf: correct broken uapi for
> BPF_PROG_TYPE_PERF_EVENT program type") um (uml) won't build
> on i386 or x86_64:
> 
>   [...]
> CC  init/main.o
>   In file included from ../include/linux/perf_event.h:18:0,
>from ../include/linux/trace_events.h:10,
>from ../include/trace/syscall.h:7,
>from ../include/linux/syscalls.h:82,
>from ../init/main.c:20:
>   ../include/uapi/linux/bpf_perf_event.h:11:32: fatal error:
>   asm/bpf_perf_event.h: No such file or directory #include
>   
>   [...]
> 
> Lets add missing bpf_perf_event.h also to um arch. This seems
> to be the only one still missing.
> 
> Fixes: c895f6f703ad ("bpf: correct broken uapi for BPF_PROG_TYPE_PERF_EVENT
> program type") Reported-by: Randy Dunlap <rdun...@infradead.org>
> Suggested-by: Richard Weinberger <rich...@sigma-star.at>
> Signed-off-by: Daniel Borkmann <dan...@iogearbox.net>
> Tested-by: Randy Dunlap <rdun...@infradead.org>
> Cc: Hendrik Brueckner <brueck...@linux.vnet.ibm.com>
> Cc: Richard Weinberger <rich...@sigma-star.at>
> Acked-by: Alexei Starovoitov <a...@kernel.org>
> ---
>  arch/um/include/asm/Kbuild | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/um/include/asm/Kbuild b/arch/um/include/asm/Kbuild
> index 50a32c3..73c57f6 100644
> --- a/arch/um/include/asm/Kbuild
> +++ b/arch/um/include/asm/Kbuild
> @@ -1,4 +1,5 @@
>  generic-y += barrier.h
> +generic-y += bpf_perf_event.h
>  generic-y += bug.h
>  generic-y += clkdev.h
>  generic-y += current.h

Acked-by: Richard Weinberger <rich...@nod.at>

Thanks,
//richard

-- 
sigma star gmbh - Eduard-Bodem-Gasse 6 - 6020 Innsbruck - Austria
ATU66964118 - FN 374287y


Re: Linux 4.15-rc3 (uml + bpf_perf_event.h)

2017-12-11 Thread Richard Weinberger
Am Montag, 11. Dezember 2017, 11:19:54 CET schrieb Daniel Borkmann:
> Hi Randy, hi Richard, [ +Hendrik for c895f6f703ad7dd2f ]
> 
> On 12/11/2017 09:32 AM, Richard Weinberger wrote:
> > Randy,
> > 
> > Am Montag, 11. Dezember 2017, 03:42:12 CET schrieb Randy Dunlap:
> >> On 12/10/2017 06:08 PM, Linus Torvalds wrote:
> >>> Another week, another rc.
> >> 
> >> um (uml) won't build on i386 or x86_64:
> >>   CC  init/main.o
> >> 
> >> In file included from ../include/linux/perf_event.h:18:0,
> >> 
> >>  from ../include/linux/trace_events.h:10,
> >>  from ../include/trace/syscall.h:7,
> >>  from ../include/linux/syscalls.h:82,
> >> 
> >>  from ../init/main.c:20:
> >> ../include/uapi/linux/bpf_perf_event.h:11:32: fatal error:
> >> asm/bpf_perf_event.h: No such file or directory #include
> >> 
> >> 
> >> ^
> >> 
> >> compilation terminated.
> >> ../scripts/Makefile.build:310: recipe for target 'init/main.o' failed
> > 
> > How do you trigger that build failure?
> > Can you share your .config?
> 
> Hmm, too bad kbuild bot doesn't catch issues on uml. I'm not too familiar
> with uml, but looks like it's the only special case where there's no
> arch/um/include/uapi/asm/. What is the usual convention to pull in such
> headers in this case? Something like the below, would that fix it for you?
> 
> Thanks for your help,
> Daniel
> 
>  arch/um/include/asm/bpf_perf_event.h | 1 +
>  include/asm-generic/bpf_perf_event.h | 1 +
>  2 files changed, 2 insertions(+)
>  create mode 100644 arch/um/include/asm/bpf_perf_event.h
>  create mode 100644 include/asm-generic/bpf_perf_event.h
> 
> diff --git a/arch/um/include/asm/bpf_perf_event.h
> b/arch/um/include/asm/bpf_perf_event.h new file mode 100644
> index 000..3097758
> --- /dev/null
> +++ b/arch/um/include/asm/bpf_perf_event.h
> @@ -0,0 +1 @@
> +#include 
> diff --git a/include/asm-generic/bpf_perf_event.h
> b/include/asm-generic/bpf_perf_event.h new file mode 100644
> index 000..67112e5
> --- /dev/null
> +++ b/include/asm-generic/bpf_perf_event.h
> @@ -0,0 +1 @@
> +#include 

Hmm, what about this?

diff --git a/arch/um/include/asm/Kbuild b/arch/um/include/asm/Kbuild
index 50a32c33d729..fb35ec000433 100644
--- a/arch/um/include/asm/Kbuild
+++ b/arch/um/include/asm/Kbuild
@@ -27,3 +27,4 @@ generic-y += trace_clock.h
 generic-y += word-at-a-time.h
 generic-y += xor.h
 generic-y += kprobes.h
+generic-y += bpf_perf_event.h

-- 
sigma star gmbh - Eduard-Bodem-Gasse 6 - 6020 Innsbruck - Austria
ATU66964118 - FN 374287y


Re: Linux 4.15-rc3 (uml + bpf_perf_event.h)

2017-12-11 Thread Richard Weinberger
Am Montag, 11. Dezember 2017, 18:27:40 CET schrieb Randy Dunlap:
> On 12/11/2017 02:19 AM, Daniel Borkmann wrote:
> > Hi Randy, hi Richard, [ +Hendrik for c895f6f703ad7dd2f ]
> > 
> > On 12/11/2017 09:32 AM, Richard Weinberger wrote:
> >> Randy,
> >> 
> >> Am Montag, 11. Dezember 2017, 03:42:12 CET schrieb Randy Dunlap:
> >>> On 12/10/2017 06:08 PM, Linus Torvalds wrote:
> >>>> Another week, another rc.
> >>> 
> >>> um (uml) won't build on i386 or x86_64:
> >>>   CC  init/main.o
> >>> 
> >>> In file included from ../include/linux/perf_event.h:18:0,
> >>> 
> >>>  from ../include/linux/trace_events.h:10,
> >>>  from ../include/trace/syscall.h:7,
> >>>  from ../include/linux/syscalls.h:82,
> >>> 
> >>>  from ../init/main.c:20:
> >>> ../include/uapi/linux/bpf_perf_event.h:11:32: fatal error:
> >>> asm/bpf_perf_event.h: No such file or directory #include
> >>> 
> >>> 
> >>> ^
> >>> 
> >>> compilation terminated.
> >>> ../scripts/Makefile.build:310: recipe for target 'init/main.o' failed
> >> 
> >> How do you trigger that build failure?
> >> Can you share your .config?
> 
> Richard, it's just defconfig on both i386 and x86_64.

How odd, here it used to build.
...until I did a make mrproper, and tried again. ;-\

Thanks,
//richard



Re: Linux 4.15-rc3 (uml + bpf_perf_event.h)

2017-12-11 Thread Richard Weinberger
Randy,

Am Montag, 11. Dezember 2017, 03:42:12 CET schrieb Randy Dunlap:
> On 12/10/2017 06:08 PM, Linus Torvalds wrote:
> > Another week, another rc.
> 
> um (uml) won't build on i386 or x86_64:
> 
>   CC  init/main.o
> In file included from ../include/linux/perf_event.h:18:0,
>  from ../include/linux/trace_events.h:10,
>  from ../include/trace/syscall.h:7,
>  from ../include/linux/syscalls.h:82,
>  from ../init/main.c:20:
> ../include/uapi/linux/bpf_perf_event.h:11:32: fatal error:
> asm/bpf_perf_event.h: No such file or directory #include
> 
> ^
> compilation terminated.
> ../scripts/Makefile.build:310: recipe for target 'init/main.o' failed
> 

How do you trigger that build failure?
Can you share your .config?
 
Thanks,
//richard

-- 
sigma star gmbh - Eduard-Bodem-Gasse 6 - 6020 Innsbruck - Austria
ATU66964118 - FN 374287y


Re: [PATCH 1/3] bpf: Don't check for current being NULL

2017-10-16 Thread Richard Weinberger
Alexei,

Am Dienstag, 17. Oktober 2017, 00:06:08 CEST schrieb Alexei Starovoitov:
> On Mon, Oct 16, 2017 at 11:18 AM, Richard Weinberger <rich...@nod.at> wrote:
> > current is never NULL.
> > 
> > Signed-off-by: Richard Weinberger <rich...@nod.at>
> > ---
> > 
> >  kernel/bpf/helpers.c | 12 
> >  1 file changed, 12 deletions(-)
> > 
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index 3d24e238221e..e8845adcd15e 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -120,9 +120,6 @@ BPF_CALL_0(bpf_get_current_pid_tgid)
> > 
> >  {
> >  
> > struct task_struct *task = current;
> > 
> > -   if (unlikely(!task))
> > -   return -EINVAL;
> > -
> 
> really? in all context? including irq and nmi?

I would be astonished current is NULL in such a context.

To be sure, let's CC linux-arch.
IIRC I talked also with Al about this and he also assumed that current
cannot be NULL.

Thanks,
//richard


Re: [PATCH 3/3] bpf: Make sure that ->comm does not change under us.

2017-10-16 Thread Richard Weinberger
Am Montag, 16. Oktober 2017, 23:02:06 CEST schrieb Daniel Borkmann:
> On 10/16/2017 10:55 PM, Richard Weinberger wrote:
> > Am Montag, 16. Oktober 2017, 22:50:43 CEST schrieb Daniel Borkmann:
> >>>   struct task_struct *task = current;
> >>> 
> >>> + task_lock(task);
> >>> 
> >>>   strncpy(buf, task->comm, size);
> >>> 
> >>> + task_unlock(task);
> >> 
> >> Wouldn't this potentially lead to a deadlock? E.g. you attach yourself
> >> to task_lock() / spin_lock() / etc, and then the BPF prog triggers the
> >> bpf_get_current_comm() taking the lock again ...
> > 
> > Yes, but doesn't the same apply to the use case when I attach to strncpy()
> > and run bpf_get_current_comm()?
> 
> You mean due to recursion? In that case trace_call_bpf() would bail out
> due to the bpf_prog_active counter.

Ah, that's true.
So, when someone wants to use bpf_get_current_comm() while tracing task_lock,
we have a problem. I agree.
On the other hand, without locking the function may return wrong results.

Thanks,
//richard



Re: [PATCH 3/3] bpf: Make sure that ->comm does not change under us.

2017-10-16 Thread Richard Weinberger
Am Montag, 16. Oktober 2017, 22:50:43 CEST schrieb Daniel Borkmann:
> > struct task_struct *task = current;
> > 
> > +   task_lock(task);
> > 
> > strncpy(buf, task->comm, size);
> > 
> > +   task_unlock(task);
> 
> Wouldn't this potentially lead to a deadlock? E.g. you attach yourself
> to task_lock() / spin_lock() / etc, and then the BPF prog triggers the
> bpf_get_current_comm() taking the lock again ...

Yes, but doesn't the same apply to the use case when I attach to strncpy()
and run bpf_get_current_comm()?

Thanks,
//richard



Re: [PATCH 2/3] bpf: Remove dead variable

2017-10-16 Thread Richard Weinberger
Am Montag, 16. Oktober 2017, 21:11:47 CEST schrieb Daniel Borkmann: 
> > I can squash it into 1/3, I kept it that way because
> > even without 1/3 this variable is unused.
> 
> Hmm, the helper looks like the below. In patch 1/3 you removed
> the 'if (unlikely(!task))' test where the variable was used before,
> so 2/3 without the 1/3 would result in a compile error.

Why a compile error? It emits a warning.

> BPF_CALL_0(bpf_get_current_uid_gid)
> {
>   struct task_struct *task = current;
>   kuid_t uid;
>   kgid_t gid;
> 
>   if (unlikely(!task))
>   return -EINVAL;

Well, this is the only "user". Okay.

>   current_uid_gid(, );

Here we use current. So, task was always in vain.

So, I can happily squash 2/3 into 1/3 and resent.
The series just represented the way I've worked on the code... 

Thanks,
//richard




Re: [PATCH 2/3] bpf: Remove dead variable

2017-10-16 Thread Richard Weinberger
Am Montag, 16. Oktober 2017, 20:54:36 CEST schrieb Daniel Borkmann:
> On 10/16/2017 08:18 PM, Richard Weinberger wrote:
> > task is never used in bpf_get_current_uid_gid(), kill it.
> > 
> > Signed-off-by: Richard Weinberger <rich...@nod.at>
> > ---
> > 
> >   kernel/bpf/helpers.c | 1 -
> >   1 file changed, 1 deletion(-)
> > 
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index e8845adcd15e..511c9d522cfc 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -131,7 +131,6 @@ const struct bpf_func_proto
> > bpf_get_current_pid_tgid_proto = {> 
> >   BPF_CALL_0(bpf_get_current_uid_gid)
> >   {
> > 
> > -   struct task_struct *task = current;
> > 
> > kuid_t uid;
> > kgid_t gid;
> 
> Needs to be squashed into patch 1/3?

I can squash it into 1/3, I kept it that way because
even without 1/3 this variable is unused.

Thanks,
//richard



[PATCH 3/3] bpf: Make sure that ->comm does not change under us.

2017-10-16 Thread Richard Weinberger
Sadly we cannot use get_task_comm() since bpf_get_current_comm()
allows truncation.

Signed-off-by: Richard Weinberger <rich...@nod.at>
---
 kernel/bpf/helpers.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 511c9d522cfc..4b042b24524d 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* If kernel subsystem is allowing eBPF programs to call this function,
  * inside its own verifier_ops->get_func_proto() callback it should return
@@ -149,7 +150,9 @@ BPF_CALL_2(bpf_get_current_comm, char *, buf, u32, size)
 {
struct task_struct *task = current;
 
+   task_lock(task);
strncpy(buf, task->comm, size);
+   task_unlock(task);
 
/* Verifier guarantees that size > 0. For task->comm exceeding
 * size, guarantee that buf is %NUL-terminated. Unconditionally
-- 
2.13.6



[PATCH 1/3] bpf: Don't check for current being NULL

2017-10-16 Thread Richard Weinberger
current is never NULL.

Signed-off-by: Richard Weinberger <rich...@nod.at>
---
 kernel/bpf/helpers.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 3d24e238221e..e8845adcd15e 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -120,9 +120,6 @@ BPF_CALL_0(bpf_get_current_pid_tgid)
 {
struct task_struct *task = current;
 
-   if (unlikely(!task))
-   return -EINVAL;
-
return (u64) task->tgid << 32 | task->pid;
 }
 
@@ -138,9 +135,6 @@ BPF_CALL_0(bpf_get_current_uid_gid)
kuid_t uid;
kgid_t gid;
 
-   if (unlikely(!task))
-   return -EINVAL;
-
current_uid_gid(, );
return (u64) from_kgid(_user_ns, gid) << 32 |
 from_kuid(_user_ns, uid);
@@ -156,9 +150,6 @@ BPF_CALL_2(bpf_get_current_comm, char *, buf, u32, size)
 {
struct task_struct *task = current;
 
-   if (unlikely(!task))
-   goto err_clear;
-
strncpy(buf, task->comm, size);
 
/* Verifier guarantees that size > 0. For task->comm exceeding
@@ -167,9 +158,6 @@ BPF_CALL_2(bpf_get_current_comm, char *, buf, u32, size)
 */
buf[size - 1] = 0;
return 0;
-err_clear:
-   memset(buf, 0, size);
-   return -EINVAL;
 }
 
 const struct bpf_func_proto bpf_get_current_comm_proto = {
-- 
2.13.6



[PATCH 2/3] bpf: Remove dead variable

2017-10-16 Thread Richard Weinberger
task is never used in bpf_get_current_uid_gid(), kill it.

Signed-off-by: Richard Weinberger <rich...@nod.at>
---
 kernel/bpf/helpers.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index e8845adcd15e..511c9d522cfc 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -131,7 +131,6 @@ const struct bpf_func_proto bpf_get_current_pid_tgid_proto 
= {
 
 BPF_CALL_0(bpf_get_current_uid_gid)
 {
-   struct task_struct *task = current;
kuid_t uid;
kgid_t gid;
 
-- 
2.13.6



Re: [PATCH] bpf: devmap: Check attr->max_entries more carefully

2017-10-15 Thread Richard Weinberger
Am Montag, 16. Oktober 2017, 00:00:20 CEST schrieb Richard Weinberger:
> max_entries is user controlled and used as input for __alloc_percpu().
> This function expects that the allocation size is a power of two and
> less than PCPU_MIN_UNIT_SIZE.
> Otherwise a WARN() is triggered.

On a second though, I think we should also have a hard limit for ->max_entries 
or check for an overflow here:

static u64 dev_map_bitmap_size(const union bpf_attr *attr)
{
return BITS_TO_LONGS(attr->max_entries) * sizeof(unsigned long);
}

Thanks,
//richard



[PATCH] bpf: devmap: Check attr->max_entries more carefully

2017-10-15 Thread Richard Weinberger
max_entries is user controlled and used as input for __alloc_percpu().
This function expects that the allocation size is a power of two and
less than PCPU_MIN_UNIT_SIZE.
Otherwise a WARN() is triggered.

Fixes: 11393cc9b9be ("xdp: Add batching support to redirect map")
Reported-by: Shankara Pailoor <sp3...@columbia.edu>
Reported-by: syzkaller <syzkal...@googlegroups.com>
Signed-off-by: Richard Weinberger <rich...@nod.at>
---
 kernel/bpf/devmap.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index e093d9a2c4dd..6ce00083103b 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -49,6 +49,7 @@
  */
 #include 
 #include 
+#include 
 
 struct bpf_dtab_netdev {
struct net_device *dev;
@@ -77,6 +78,7 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
struct bpf_dtab *dtab;
int err = -EINVAL;
u64 cost;
+   size_t palloc_size;
 
/* check sanity of attributes */
if (attr->max_entries == 0 || attr->key_size != 4 ||
@@ -95,9 +97,14 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
dtab->map.map_flags = attr->map_flags;
dtab->map.numa_node = bpf_map_attr_numa_node(attr);
 
+   palloc_size = roundup_pow_of_two(dev_map_bitmap_size(attr));
+   if (palloc_size > PCPU_MIN_UNIT_SIZE ||
+   palloc_size < dev_map_bitmap_size(attr))
+   return ERR_PTR(-EINVAL);
+
/* make sure page count doesn't overflow */
cost = (u64) dtab->map.max_entries * sizeof(struct bpf_dtab_netdev *);
-   cost += dev_map_bitmap_size(attr) * num_possible_cpus();
+   cost += palloc_size * num_possible_cpus();
if (cost >= U32_MAX - PAGE_SIZE)
goto free_dtab;
 
@@ -111,7 +118,7 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
err = -ENOMEM;
 
/* A per cpu bitfield with a bit per possible net device */
-   dtab->flush_needed = __alloc_percpu(dev_map_bitmap_size(attr),
+   dtab->flush_needed = __alloc_percpu(palloc_size,
__alignof__(unsigned long));
if (!dtab->flush_needed)
goto free_dtab;
-- 
2.13.6



Re: WARNING: kernel stack frame pointer at ffff880156a5fea0 in bash:2103 has bad value 00007ffec7d87e50

2017-09-30 Thread Richard Weinberger
Josh,

Am Mittwoch, 27. September 2017, 16:14:30 CEST schrieb Josh Poimboeuf:
> On Wed, Sep 27, 2017 at 08:51:22AM +0200, Richard Weinberger wrote:
> > Am Mittwoch, 27. September 2017, 00:42:46 CEST schrieb Josh Poimboeuf:
> > > > Here is another variant of the warning, it matches the attached 
.config:
> > > I can take a look at it.  Unfortunately, for these types of issues I
> > > often need the vmlinux file to be able to make sense of the unwinder
> > > dump.  So if you happen to have somewhere to copy the vmlinux to, that
> > > would be helpful.  Or if you give me your GCC version I can try to
> > > rebuild it locally.
> > 
> > There you go:
> > http://git.infradead.org/~rw/bpf_splat/vmlinux.xz
> 
> Thanks.  Can you test this fix?
> 
> 
> diff --git a/arch/x86/kernel/kprobes/common.h
> b/arch/x86/kernel/kprobes/common.h index db2182d63ed0..3fc0f9a794cb 100644
> --- a/arch/x86/kernel/kprobes/common.h
> +++ b/arch/x86/kernel/kprobes/common.h
> @@ -3,6 +3,15 @@
> 
>  /* Kprobes and Optprobes common header */
> 
> +#include 
> +
> +#ifdef CONFIG_FRAME_POINTER
> +# define SAVE_RBP_STRING "   push %" _ASM_BP "\n" \
> +  "  mov  %" _ASM_SP ", %" _ASM_BP "\n"
> +#else
> +# define SAVE_RBP_STRING "   push %" _ASM_BP "\n"
> +#endif
> +
>  #ifdef CONFIG_X86_64
>  #define SAVE_REGS_STRING \
>   /* Skip cs, ip, orig_ax. */ \
> @@ -17,7 +26,7 @@
>   "   pushq %r10\n"   \
>   "   pushq %r11\n"   \
>   "   pushq %rbx\n"   \
> - "   pushq %rbp\n"   \
> + SAVE_RBP_STRING \
>   "   pushq %r12\n"   \
>   "   pushq %r13\n"   \
>   "   pushq %r14\n"   \
> @@ -48,7 +57,7 @@
>   "   pushl %es\n"\
>   "   pushl %ds\n"\
>   "   pushl %eax\n"   \
> - "   pushl %ebp\n"   \
> + SAVE_RBP_STRING \
>   "   pushl %edi\n"   \
>   "   pushl %esi\n"   \
>   "   pushl %edx\n"   \

This fixes the issue for me!

Thanks,
//richard

-- 
sigma star gmbh - Eduard-Bodem-Gasse 6 - 6020 Innsbruck - Austria
ATU66964118 - FN 374287y


Re: WARNING: kernel stack frame pointer at ffff880156a5fea0 in bash:2103 has bad value 00007ffec7d87e50

2017-09-27 Thread Richard Weinberger
Am Mittwoch, 27. September 2017, 00:42:46 CEST schrieb Josh Poimboeuf:
> > Here is another variant of the warning, it matches the attached .config:
> I can take a look at it.  Unfortunately, for these types of issues I
> often need the vmlinux file to be able to make sense of the unwinder
> dump.  So if you happen to have somewhere to copy the vmlinux to, that
> would be helpful.  Or if you give me your GCC version I can try to
> rebuild it locally.

There you go:
http://git.infradead.org/~rw/bpf_splat/vmlinux.xz

Thanks,
//richard

-- 
sigma star gmbh - Eduard-Bodem-Gasse 6 - 6020 Innsbruck - Austria
ATU66964118 - FN 374287y


Re: WARNING: kernel stack frame pointer at ffff880156a5fea0 in bash:2103 has bad value 00007ffec7d87e50

2017-09-26 Thread Richard Weinberger
Alexei,

CC'ing Josh and Ingo.

Am Dienstag, 26. September 2017, 06:09:02 CEST schrieb Alexei Starovoitov:
> On Mon, Sep 25, 2017 at 11:23:31PM +0200, Richard Weinberger wrote:
> > Hi!
> > 
> > While playing with bcc's opensnoop tool on Linux 4.14-rc2 I managed to
> > trigger this splat:
> > 
> > [  297.629773] WARNING: kernel stack frame pointer at 880156a5fea0 in
> > bash:2103 has bad value 7ffec7d87e50
> > [  297.629777] unwind stack type:0 next_sp:  (null) mask:0x6
> > graph_idx:0
> > [  297.629783] 88015b207ae0: 88015b207b68 (0x88015b207b68)
> > [  297.629790] 88015b207ae8: b163c00e
> > (__save_stack_trace+0x6e/
> > 0xd0)
> > [  297.629792] 88015b207af0:  ...
> > [  297.629795] 88015b207af8: 880156a58000 (0x880156a58000)
> > [  297.629799] 88015b207b00: 880156a6 (0x880156a6)
> > [  297.629800] 88015b207b08:  ...
> > [  297.629803] 88015b207b10: 0006 (0x6)
> > [  297.629806] 88015b207b18: 880151b02700 (0x880151b02700)
> > [  297.629809] 88015b207b20: 0101 (0x101)
> > [  297.629812] 88015b207b28: 880156a5fea0 (0x880156a5fea0)
> > [  297.629815] 88015b207b30: 88015b207ae0 (0x88015b207ae0)
> > [  297.629818] 88015b207b38: c0050282 (0xc0050282)
> > [  297.629819] 88015b207b40:  ...
> > [  297.629822] 88015b207b48: 0100 (0x100)
> > [  297.629825] 88015b207b50: 880157b98280 (0x880157b98280)
> > [  297.629828] 88015b207b58: 880157b98380 (0x880157b98380)
> > [  297.629831] 88015b207b60: 88015ad2b500 (0x88015ad2b500)
> > [  297.629834] 88015b207b68: 88015b207b78 (0x88015b207b78)
> > [  297.629838] 88015b207b70: b163c086
> > (save_stack_trace+0x16/0x20) [  297.629841] 88015b207b78:
> > 88015b207da8 (0x88015b207da8) [  297.629847] 88015b207b80:
> > b18a8ed6 (save_stack+0x46/0xd0) [  297.629850] 88015b207b88:
> > 004c (0x4c)
> > [  297.629852] 88015b207b90: 88015b207ba0 (0x88015b207ba0)
> > [  297.629855] 88015b207b98: 8801 (0x8801)
> > [  297.629859] 88015b207ba0: b163c086
> > (save_stack_trace+0x16/0x20) [  297.629864] 88015b207ba8:
> > b18a8ed6 (save_stack+0x46/0xd0) [  297.629868] 88015b207bb0:
> > b18a9752 (kasan_slab_free+0x72/0xc0)
> Thanks for the report!
> I'm not sure I understand what's going on here.
> It seems you have kasan enabled and it's trying to do save_stack()
> and something crashing?
> I don't see any bpf related helpers in the stack trace.
> Which architecture is this? and .config ?
> Is bpf jit enabled? If so, make sure that net.core.bpf_jit_kallsyms=1

I found some time to dig a little further.
It seems to happen only when CONFIG_DEBUG_SPINLOCK is enabled, please see the 
attached .config. The JIT is off.
KAsan is also not involved at all, the regular stack saving machinery from the 
trace framework initiates the stack unwinder.

The issue arises as soon as in pre_handler_kretprobe() raw_spin_lock_irqsave() 
is being called.
It happens on all releases that have commit c32c47c68a0a ("x86/unwind: Warn on 
bad frame pointer").
Interestingly it does not happen when I run 
samples/kprobes/kretprobe_example.ko. So, BPF must be involved somehow.

Here is another variant of the warning, it matches the attached .config:

[   42.729039] WARNING: kernel stack frame pointer at 99ef4076bea0 in 
opensnoop:2008 has bad value 0008
[   42.729041] unwind stack type:0 next_sp:  (null) mask:0x2 
graph_idx:0
[   42.729042] 99ef4076bcb0: 99ef4076bd38 (0x99ef4076bd38)
[   42.729044] 99ef4076bcb8: ac42781e (__save_stack_trace+0x6e/
0xd0)
[   42.729044] 99ef4076bcc0:  ...
[   42.729045] 99ef4076bcc8: 99ef40768000 (0x99ef40768000)
[   42.729045] 99ef4076bcd0: 99ef4076c000 (0x99ef4076c000)
[   42.729045] 99ef4076bcd8:  ...
[   42.729046] 99ef4076bce0: 0002 (0x2)
[   42.729046] 99ef4076bce8: 8a1c39163fc0 (0x8a1c39163fc0)
[   42.729047] 99ef4076bcf0: 0001 (0x1)
[   42.729047] 99ef4076bcf8: 99ef4076bea0 (0x99ef4076bea0)
[   42.729048] 99ef4076bd00: 99ef4076bcb0 (0x99ef4076bcb0)
[   42.729048] 99ef4076bd08: c00b302f (0xc00b302f)
[   42.729048] 99ef4076bd10:  ...
[   42.729049] 99ef4076bd18: 8a1c39163fc0 (0x8a1c39163fc0)
[   42.729049] 99ef4076bd20:  ...
[   42.7290

WARNING: kernel stack frame pointer at ffff880156a5fea0 in bash:2103 has bad value 00007ffec7d87e50

2017-09-25 Thread Richard Weinberger
Hi!

While playing with bcc's opensnoop tool on Linux 4.14-rc2 I managed to trigger 
this splat:

[  297.629773] WARNING: kernel stack frame pointer at 880156a5fea0 in 
bash:2103 has bad value 7ffec7d87e50
[  297.629777] unwind stack type:0 next_sp:  (null) mask:0x6 
graph_idx:0
[  297.629783] 88015b207ae0: 88015b207b68 (0x88015b207b68)
[  297.629790] 88015b207ae8: b163c00e (__save_stack_trace+0x6e/
0xd0)
[  297.629792] 88015b207af0:  ...
[  297.629795] 88015b207af8: 880156a58000 (0x880156a58000)
[  297.629799] 88015b207b00: 880156a6 (0x880156a6)
[  297.629800] 88015b207b08:  ...
[  297.629803] 88015b207b10: 0006 (0x6)
[  297.629806] 88015b207b18: 880151b02700 (0x880151b02700)
[  297.629809] 88015b207b20: 0101 (0x101)
[  297.629812] 88015b207b28: 880156a5fea0 (0x880156a5fea0)
[  297.629815] 88015b207b30: 88015b207ae0 (0x88015b207ae0)
[  297.629818] 88015b207b38: c0050282 (0xc0050282)
[  297.629819] 88015b207b40:  ...
[  297.629822] 88015b207b48: 0100 (0x100)
[  297.629825] 88015b207b50: 880157b98280 (0x880157b98280)
[  297.629828] 88015b207b58: 880157b98380 (0x880157b98380)
[  297.629831] 88015b207b60: 88015ad2b500 (0x88015ad2b500)
[  297.629834] 88015b207b68: 88015b207b78 (0x88015b207b78)
[  297.629838] 88015b207b70: b163c086 (save_stack_trace+0x16/0x20)
[  297.629841] 88015b207b78: 88015b207da8 (0x88015b207da8)
[  297.629847] 88015b207b80: b18a8ed6 (save_stack+0x46/0xd0)
[  297.629850] 88015b207b88: 004c (0x4c)
[  297.629852] 88015b207b90: 88015b207ba0 (0x88015b207ba0)
[  297.629855] 88015b207b98: 8801 (0x8801)
[  297.629859] 88015b207ba0: b163c086 (save_stack_trace+0x16/0x20)
[  297.629864] 88015b207ba8: b18a8ed6 (save_stack+0x46/0xd0)
[  297.629868] 88015b207bb0: b18a9752 (kasan_slab_free+0x72/0xc0)
[  297.629873] 88015b207bb8: b18a5e90 (kmem_cache_free+0x70/0x190)
[  297.629879] 88015b207bc0: b18b7e94 (file_free_rcu+0x34/0x40)
[  297.629886] 88015b207bc8: b172580c (rcu_process_callbacks
+0x2dc/0xcd0)
[  297.629892] 88015b207bd0: b2646cbc (__do_softirq+0x12c/0x343)
[  297.629897] 88015b207bd8: b1692304 (irq_exit+0xe4/0xf0)
[  297.629902] 88015b207be0: b2646446 (smp_apic_timer_interrupt
+0x86/0x1a0)
[  297.629907] 88015b207be8: b26452f3 (apic_timer_interrupt
+0x93/0xa0)
[  297.629913] 88015b207bf0: b1667417 (optimized_callback
+0x67/0x100)
[  297.629916] 88015b207bf8: c0050282 (0xc0050282)
[  297.629918] 88015b207c00:  ...
[  297.629921] 88015b207c08: 88015a77e24c (0x88015a77e24c)
[  297.629924] 88015b207c10: 88015b207c38 (0x88015b207c38)
[  297.629927] 88015b207c18: 88015b207c38 (0x88015b207c38)
[  297.629929] 88015b207c20: 0086 (0x86)
[  297.629932] 88015b207c28: 88015a77db00 (0x88015a77db00)
[  297.629935] 88015b207c30: 11002b640f91 (0x11002b640f91)
[  297.629938] 88015b207c38: 88015b207d10 (0x88015b207d10)
[  297.629945] 88015b207c40: b16c9f60 (try_to_wake_up+0xb0/0x710)
[  297.629947] 88015b207c48:  ...
[  297.629952] 88015b207c50: b2dfd3c0 (machine_ops+0x40/0x40)
[  297.629954] 88015b207c58: 88015a77df94 (0x88015a77df94)
[  297.629957] 88015b207c60: 00023540 (0x23540)
[  297.629960] 88015b207c68: 88015b215c38 (0x88015b215c38)
[  297.629963] 88015b207c70: 88015b20 (0x88015b20)
[  297.629965] 88015b207c78: 0086 (0x86)
[  297.629968] 88015b207c80: 0001 (0x1)
[  297.629971] 88015b207c88: 41b58ab3 (0x41b58ab3)
[  297.629975] 88015b207c90: b2d919f2 (.LC2+0x6e0e/0x83b5)
[  297.629981] 88015b207c98: b16c9eb0 (migrate_swap_stop
+0x2e0/0x2e0)
[  297.629986] 88015b207ca0: b16d0f73 (account_entity_dequeue
+0x73/0x110)
[  297.629989] 88015b207ca8: 0010 (0x10)
[  297.629992] 88015b207cb0: 88015b2235a0 (0x88015b2235a0)
[  297.629994] 88015b207cb8: 88015061e280 (0x88015061e280)
[  297.629997] 88015b207cc0: 88015b207ce8 (0x88015b207ce8)
[  297.630003] 88015b207cc8: b16c87ed (sched_avg_update+0x2d/0x90)
[  297.630005] 88015b207cd0: 0005 (0x5)
[  297.630008] 88015b207cd8: 88015b223570 (0x88015b223570)
[  297.630010] 88015b207ce0: 00dd (0xdd)
[  297.630013] 88015b207ce8: 88015a017ea0 (0x88015a017ea0)
[  297.630021] 88015b207cf0: b30b7128 (rcu_sched_state

Re: nf_conntrack: Infoleak via CTA_ID and CTA_EXPECT_ID

2017-07-12 Thread Richard Weinberger
Florian,

Am 01.07.2017 um 12:35 schrieb Florian Westphal:
>>> Perhaps we can place that in a new extension (its not needed in any
>>> fastpath ops)?
>>
>> To get rid of the infoleak we have to re-introduce the id field in struct 
>> nf_conn
>> and struct nf_conntrack_expect.
> 
> Why will this not work?

You are right, when we compute the ID from the whole object, it should be fine.

>> Otherwise have nothing to compare against in the conntrack/expect remove 
>> case.
> 
> Not following, sorry.  The id is not used anywhere except when we send
> info to userspace.
> 
> The compare on removal is not needed afaics, and its also not used when
> doing lookup to begin with, so we can just recompute it?

Isn't this a way too much overhead?

I personally favor Pablo's per-cpu counter approach.
That way the IDs are unique again and we get rid of the info leak without
much effort.

Thanks,
//richard


Re: [PATCH v3 net-next 0/4] kernel TLS

2017-07-06 Thread Richard Weinberger
Dave,

On Wed, Jun 14, 2017 at 8:36 PM, Dave Watson  wrote:
>  Documentation/networking/tls.txt   | 135 +++
>  MAINTAINERS|  10 +
>  include/linux/socket.h |   1 +
>  include/net/inet_connection_sock.h |   4 +
>  include/net/tcp.h  |  27 ++
>  include/net/tls.h  | 237 
>  include/uapi/linux/tcp.h   |   1 +
>  include/uapi/linux/tls.h   |  79 
>  net/Kconfig|   1 +
>  net/Makefile   |   1 +
>  net/ipv4/Makefile  |   2 +-
>  net/ipv4/sysctl_net_ipv4.c |  25 ++
>  net/ipv4/tcp.c |  33 +-
>  net/ipv4/tcp_ipv4.c|   2 +
>  net/ipv4/tcp_rate.c|   1 +
>  net/ipv4/tcp_ulp.c | 134 +++
>  net/tls/Kconfig|  12 +
>  net/tls/Makefile   |   7 +
>  net/tls/tls_main.c | 487 +++
>  net/tls/tls_sw.c   | 772 
> +
>  20 files changed, 1968 insertions(+), 3 deletions(-)
>  create mode 100644 Documentation/networking/tls.txt
>  create mode 100644 include/net/tls.h
>  create mode 100644 include/uapi/linux/tls.h
>  create mode 100644 net/ipv4/tcp_ulp.c
>  create mode 100644 net/tls/Kconfig
>  create mode 100644 net/tls/Makefile
>  create mode 100644 net/tls/tls_main.c
>  create mode 100644 net/tls/tls_sw.c

Sorry for the late question. Do I miss something or is this IPv4 only?

-- 
Thanks,
//richard


Re: nf_conntrack: Infoleak via CTA_ID and CTA_EXPECT_ID

2017-06-30 Thread Richard Weinberger
Florian,

Am 30.06.2017 um 21:55 schrieb Florian Westphal:
>>> Why not use a hash of the address?
>>
>> Would also work. Or xor it with a random number.
>>
>> On the other hand, for user space it would be more useful when the conntrack 
>> id
>> does not repeat that often. That's why I favor the good old counter method.
>> Currently the conntrack id is reused very fast.
>> e.g. in one of our applications we use the conntack id via NFQUEUE and watch 
>> the
>> destroy events via conntrack. It happens regularly that a new connection has 
>> the
>> same id than a different connection we saw some moments before, before we 
>> receive
>> the destroy event from the conntrack socket.
> 
> Perhaps we can place that in a new extension (its not needed in any
> fastpath ops)?

To get rid of the infoleak we have to re-introduce the id field in struct 
nf_conn
and struct nf_conntrack_expect.
Otherwise have nothing to compare against in the conntrack/expect remove case.

So the only question is what to store, IMHO a counter that can wrap around is 
the
cheapest method and would also not harm the fast-path.

Thanks,
//richard


Re: nf_conntrack: Infoleak via CTA_ID and CTA_EXPECT_ID

2017-06-30 Thread Richard Weinberger
Florian,

Am 30.06.2017 um 21:35 schrieb Florian Westphal:
> Richard Weinberger <rich...@nod.at> wrote:
>> Hi!
>>
>> I noticed that nf_conntrack leaks kernel addresses, it uses the memory 
>> address
>> as identifier used for generating conntrack and expect ids..
>> Since these ids are also visible to unprivileged users via network namespaces
>> I suggest reverting these commits:
> 
> Why not use a hash of the address?

Would also work. Or xor it with a random number.

On the other hand, for user space it would be more useful when the conntrack id
does not repeat that often. That's why I favor the good old counter method.
Currently the conntrack id is reused very fast.
e.g. in one of our applications we use the conntack id via NFQUEUE and watch the
destroy events via conntrack. It happens regularly that a new connection has the
same id than a different connection we saw some moments before, before we 
receive
the destroy event from the conntrack socket.

Thanks,
//richard


nf_conntrack: Infoleak via CTA_ID and CTA_EXPECT_ID

2017-06-30 Thread Richard Weinberger
Hi!

I noticed that nf_conntrack leaks kernel addresses, it uses the memory address
as identifier used for generating conntrack and expect ids..
Since these ids are also visible to unprivileged users via network namespaces
I suggest reverting these commits:

commit 7f85f914721ffcef382a57995182916bd43d8a65
Author: Patrick McHardy 
Date:   Fri Sep 28 14:41:27 2007 -0700

[NETFILTER]: nf_conntrack: kill unique ID

Remove the per-conntrack ID, its not necessary anymore for dumping.
For compatiblity reasons we send the address of the conntrack to
userspace as ID.

Signed-off-by: Patrick McHardy 
Signed-off-by: David S. Miller 

commit 3583240249ef354760e04ae49bd7b462a638f40c
Author: Patrick McHardy 
Date:   Fri Sep 28 14:41:50 2007 -0700

[NETFILTER]: nf_conntrack_expect: kill unique ID

Similar to the conntrack ID, the per-expectation ID is not needed
anymore, kill it.

Signed-off-by: Patrick McHardy 
Signed-off-by: David S. Miller 

Thanks,
//richard


Re: [PATCH 4/4] mtd: nand: nandsim: convert to memalloc_noreclaim_*()

2017-04-05 Thread Richard Weinberger
Michal,

Am 05.04.2017 um 13:31 schrieb Michal Hocko:
> On Wed 05-04-17 09:47:00, Vlastimil Babka wrote:
>> Nandsim has own functions set_memalloc() and clear_memalloc() for robust
>> setting and clearing of PF_MEMALLOC. Replace them by the new generic helpers.
>> No functional change.
> 
> This one smells like an abuser. Why the hell should read/write path
> touch memory reserves at all!

Could be. Let's ask Adrian, AFAIK he wrote that code.
Adrian, can you please clarify why nandsim needs to play with PF_MEMALLOC?

Thanks,
//richard


Re: [RFC 0/4] RFC: Add Checmate, BPF-driven minor LSM

2016-08-04 Thread Richard Weinberger
Sargun,

On Thu, Aug 4, 2016 at 9:11 AM, Sargun Dhillon  wrote:
> I distributed this patchset to linux-security-mod...@vger.kernel.org earlier,
> but based on the fact that the archive is down, and this is a fairly
> broad-sweeping proposal, I figured I'd grow the audience a little bit. Sorry
> if you received this multiple times.
>
> I've begun building out the skeleton of a Linux Security Module, and I'd like 
> to
> get feedback on it. It's a skeleton, and I've only populated a few hooks, so 
> I'm
> mostly looking for input on the general proposal, interest, and design. It's a
> minor LSM. My particular use case is one in which containers are being
> dynamically deployed to machines by internal developers in a different group.
> The point of Checmate is to act as an extensible bed for _safe_, complex
> security policies. It's nice to enable dynamic security policies that can be
> defined in C, and change as neccessary, without ever having to patch, or 
> rebuild
> the kernel.
>
> For many of these containers, the security policies can be fairly nuanced. One
> particular one to take into account is network security. Often times,
> administrators want to prevent ingress, and egress connectivity except from a
> few select IPs. Egress filtering can be managed using net_cls, but without
> modifying running software, it's non-trivial to attach a filter to all sockets
> being created within a container. The inet_conn_request, socket_recvmsg,
> socket_sock_rcv_skb hooks make this trivial to implement.

What is wrong with having firewall rules per container?
Either by matching the container IP or an interface...

> Other times, containers need to be throttled in places where there's not 
> really
> a good place to impose that policy for software which isn't built in-house.  
> If
> one wants to limit file creations/sec, or reject I/O under certain
> characteristics, there's not a great place to do it now. This gives engineers 
> a
> mechanism to write those policies.

Hmm, not sure if resource control is something we want to do with an LSM.

> This same flexibility can be used to take existing programs and enable safe 
> BPF
> helpers to modify memory to allow rules to pass. One example that I prototyped
> was Docker's port mapping, which has an overhead (DNAT), and there's some loss
> of fidelity in the BSD Socket API to identify what's going on. Instead, we can
> just rewrite the port in a bind, based upon some data in a BPF map, and a 
> cgroup
> match.
>
> I can actually see other minor security modules being implemented in Checmate,
> for example, Yama, or the recently proposed Hardchroot could be reimplemented 
> in
> BPF. Potentially, they could even be API compatible.
>
> Although, at first, much of this sounds like seccomp, it's quite different. 
> For
> one, what we can do in the security hooks is more complex (access to kernel
> pointers). The other side of this is we can have effects on a system-wide,
> or cgroup level. This also circumvents the need for CRIU-friendly policies.

It is like seccomp except that you have a single rule set and target LSM hooks
instead of syscalls, right?

-- 
Thanks,
//richard


Re: [RFC] WireGuard: next generation secure network tunnel

2016-07-01 Thread Richard Weinberger
Jason,

Am 01.07.2016 um 16:25 schrieb Jason A. Donenfeld:
> Hi Richard,
> 
> On Fri, Jul 1, 2016 at 1:42 PM, Richard Weinberger
> <richard.weinber...@gmail.com> wrote:
>> So every logical tunnel will allocate a new net device?
>> Doesn't this scale badly? I have ipsec alike setups
>> with many, many road warriors in mind.
> 
> No, this isn't the case. Each net device has multiple peers. Check out
> the example config on the website, pasted here for convenience:
> 
>> [Interface]
>> PrivateKey = yAnz5TF+lXXJte14tji3zlMNq+hd2rYUIgJBgB3fBmk=
>> ListenPort = 41414
>>
>> [Peer]
>> PublicKey = xTIBA5rboUvnH4htodjb6e697QjLERt1NAB4mZqp8Dg=
>> AllowedIPs = 10.192.122.3/32, 10.192.124.1/24
>>
>> [Peer]
>> PublicKey = TrMvSoP4jYQlY6RIzBgbssQqY3vxI2Pi+y71lOWWXX0=
>> AllowedIPs = 10.192.122.4/32, 192.168.0.0/16
>>
>> [Peer]
>> PublicKey = gN65BkIKy1eCE9pP1wdc8ROUtkHLF2PfAqYdyYBz6EA=
>> AllowedIPs = 10.10.10.230/32
> 
> If that file is example.conf, you could set up a single device like this:
> 
> $ ip link add dev wg0 type wireguard
> $ wg setconf wg0 example.conf
> 
> That single netdev is now configured to communicate with several peers.
> 
> I hope this clarifies things. Let me know if you have further questions.

Yes. Makes sense. :-)

Thanks,
//richard


Re: [RFC] WireGuard: next generation secure network tunnel

2016-07-01 Thread Richard Weinberger
On Tue, Jun 28, 2016 at 4:49 PM, Jason A. Donenfeld  wrote:
> WireGuard acts as a virtual interface, doing layer 3 IP tunneling,
> addable with "ip link add dev wg0 type wireguard". You can set the
> interface's local IP and routes using the usual ip-address and

So every logical tunnel will allocate a new net device?
Doesn't this scale badly? I have ipsec alike setups
with many, many road warriors in mind.

-- 
Thanks,
//richard


Re: [PATCH net-next 0/8] tou: Transports over UDP - part I

2016-06-23 Thread Richard Weinberger
Am 23.06.2016 um 09:40 schrieb David Miller:
> From: Richard Weinberger <richard.weinber...@gmail.com>
> Date: Thu, 23 Jun 2016 00:15:04 +0200
> 
>> On Thu, Jun 16, 2016 at 7:51 PM, Tom Herbert <t...@herbertland.com> wrote:
>>> Transports over UDP is intended to encapsulate TCP and other transport
>>> protocols directly and securely in UDP.
>>>
>>> The goal of this work is twofold:
>>>
>>> 1) Allow applications to run their own transport layer stack (i.e.from
>>>userspace). This eliminates dependencies on the OS (e.g. solves a
>>>major dependency issue for Facebook on clients).
>>
>> Facebook on clients would be a Facebook app on mobile devices?
>> Does that mean that the Facebook app is so advanced and complicated
>> that it needs a special TCP stack?!
> 
> No, the TCP stack in the android/iOS/Windows kernel is so out of date
> that in order to get even moderately recent TCP features it is
> necessary to do this.

I see.
So the plan is bringing TOU into almost every kernel out there
and then ship Apps with their own TCP stacks since vendors are unable
to deliver decent updates.

I didn't realize that the situation is *that* worse. :(

Thanks,
//richard


Re: [PATCH net-next 0/8] tou: Transports over UDP - part I

2016-06-22 Thread Richard Weinberger
On Thu, Jun 16, 2016 at 7:51 PM, Tom Herbert  wrote:
> Transports over UDP is intended to encapsulate TCP and other transport
> protocols directly and securely in UDP.
>
> The goal of this work is twofold:
>
> 1) Allow applications to run their own transport layer stack (i.e.from
>userspace). This eliminates dependencies on the OS (e.g. solves a
>major dependency issue for Facebook on clients).

Facebook on clients would be a Facebook app on mobile devices?
Does that mean that the Facebook app is so advanced and complicated
that it needs a special TCP stack?!

-- 
Thanks,
//richard


Re: [PATCH] net:ppp: replace too strict capability restriction on opening /dev/ppp

2016-06-20 Thread Richard Weinberger
Am 20.06.2016 um 07:02 schrieb Andi Kleen:
> Shanker Wang  writes:
> 
>> This patch removes the check for CAP_NET_ADMIN in the initial namespace
>> when opening /dev/open. Instead, CAP_NET_ADMIN is checked in the user
>> namespace the net namespace was created so that /dev/ppp cat get opened
>> in a unprivileged container.
> 
> Seems dangerous. From a quick look at the PPP ioctl there is no limit
> how many PPP devices this can create. So a container having access to
> this would be able to fill all kernel memory. Probably needs more
> auditing and hardening first.
> 
> In general there seems to be a lot of attack surface for root
> in PPP.

You are right.
Shanker Wang, I had also another at the open function, it is more complicated
than I thought. Please see how ppp_unattached_ioctl() is called.
Before we give containers access to it the use of nsproxy has to be removed.
Not sure how easy this will be, especially since you cannot break existing 
users.

Thanks,
//richard




Re: [PATCH] net:ppp: replace too strict capability restriction on opening /dev/ppp

2016-06-19 Thread Richard Weinberger
Am 19.06.2016 um 12:36 schrieb Shanker Wang:
> 
>> 在 2016年6月19日,12:13,Richard Weinberger <rich...@nod.at> 写道:
>>
>> Am 19.06.2016 um 07:21 schrieb Shanker Wang:
>>> This patch removes the check for CAP_NET_ADMIN in the initial namespace
>>> when opening /dev/open. Instead, CAP_NET_ADMIN is checked in the user
>>> namespace the net namespace was created so that /dev/ppp cat get opened
>>> in a unprivileged container.
>>>
>>> Cc: Hannes Frederic Sowa <han...@stressinduktion.org>
>>> Cc: Richard Weinberger <richard.weinber...@gmail.com>
>>> Cc: Guillaume Nault <g.na...@alphalink.fr>
>>> Cc: Miao Wang <shankerwangm...@gmail.com>
>>> Signed-off-by: Miao Wang <miao.w...@tuna.tsinghua.edu.cn>
>>> ---
>>> drivers/net/ppp/ppp_generic.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
>>> index f572b31..4b3b2b5 100644
>>> --- a/drivers/net/ppp/ppp_generic.c
>>> +++ b/drivers/net/ppp/ppp_generic.c
>>> @@ -380,7 +380,7 @@ static int ppp_open(struct inode *inode, struct file 
>>> *file)
>>> /*
>>>  * This could (should?) be enforced by the permissions on /dev/ppp.
>>>  */
>>> -   if (!capable(CAP_NET_ADMIN))
>>> +   if (!ns_capable(current->nsproxy->net_ns->user_ns, CAP_NET_ADMIN))
>>> return -EPERM;
>>
>> Shouldn't this be a ns_capable(net->user_ns, …?
>> Otherwise an user can create a new user_ns followed by a new net_ns and has
>> CAP_NET_ADMIN. We need to check whether he is allowed in the user_ns of the
>> net_ns which belongs to the ppp net device which you want to open.
> You are totally right. However, I wonder how can i get the “net” struct when 
> opening /dev/ppp

I'm sure you can get it somehow via file->private_data.

Thanks,
//richard


Re: [PATCH] net:ppp: replace too strict capability restriction on opening /dev/ppp

2016-06-19 Thread Richard Weinberger
Am 19.06.2016 um 07:21 schrieb Shanker Wang:
> This patch removes the check for CAP_NET_ADMIN in the initial namespace
> when opening /dev/open. Instead, CAP_NET_ADMIN is checked in the user
> namespace the net namespace was created so that /dev/ppp cat get opened
> in a unprivileged container.
> 
> Cc: Hannes Frederic Sowa <han...@stressinduktion.org>
> Cc: Richard Weinberger <richard.weinber...@gmail.com>
> Cc: Guillaume Nault <g.na...@alphalink.fr>
> Cc: Miao Wang <shankerwangm...@gmail.com>
> Signed-off-by: Miao Wang <miao.w...@tuna.tsinghua.edu.cn>
> ---
> drivers/net/ppp/ppp_generic.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
> index f572b31..4b3b2b5 100644
> --- a/drivers/net/ppp/ppp_generic.c
> +++ b/drivers/net/ppp/ppp_generic.c
> @@ -380,7 +380,7 @@ static int ppp_open(struct inode *inode, struct file 
> *file)
>   /*
>* This could (should?) be enforced by the permissions on /dev/ppp.
>*/
> - if (!capable(CAP_NET_ADMIN))
> + if (!ns_capable(current->nsproxy->net_ns->user_ns, CAP_NET_ADMIN))
>   return -EPERM;

Shouldn't this be a ns_capable(net->user_ns, ...?
Otherwise an user can create a new user_ns followed by a new net_ns and has
CAP_NET_ADMIN. We need to check whether he is allowed in the user_ns of the
net_ns which belongs to the ppp net device which you want to open.

Thanks,
//richard


Re: [Question] Should `CAP_NET_ADMIN` be needed when opening `/dev/ppp`?

2016-05-03 Thread Richard Weinberger
On Tue, May 3, 2016 at 12:12 PM, Guillaume Nault  wrote:
> On Sun, May 01, 2016 at 09:38:57PM +0800, Wang Shanker wrote:
>> static int ppp_open(struct inode *inode, struct file *file)
>> {
>>   /*
>>* This could (should?) be enforced by the permissions on /dev/ppp.
>>*/
>>   if (!capable(CAP_NET_ADMIN))
>>   return -EPERM;
>>   return 0;
>> }
>> ```
>>
>> I wonder why CAP_NET_ADMIN is needed here, rather than leaving it to the
>> permission of the device node. If there is no need, I suggest that the
>> CAP_NET_ADMIN check be removed.
>>
> If this test was removed here, then it'd have to be added again in the
> PPPIOCNEWUNIT ioctl, at the very least, because creating a netdevice
> should require CAP_NET_ADMIN. Therefore that wouldn't help for your
> case.
> I don't know why the test was placed in ppp_open() in the first place,
> but changing it now would have side effects on user space. So I'd
> rather leave the code as is.

I think the question is whether we really require having CAP_NET_ADMIN
in the initial namespace and not just in the current one.
Is ppp not network namespace aware?

-- 
Thanks,
//richard


Re: [PATCH -next 0/5] netlink: remove mmapped netlink support

2016-03-27 Thread Richard Weinberger
On Thu, Feb 18, 2016 at 3:03 PM, Florian Westphal  wrote:
> As discussed during netconf 2016 in Seville, this series removes
> CONFIG_NETLINK_MMAP.

Sorry for hopping in so^Wtoo late.
I always thought mmaped netlink is the way to go for
userspace packet processing.
Sure, the problems you state are real but that the whole concept
is now thrown away kind of surprises me.

> Close to three years after it was merged it has retained several problems
> that do not appear to be fixable.
>
> No official netfilter libmnl release contains support for mmap backed netlink
> sockets. No openvswitch release makes use of it either.
>
> To use the mmap interface, userspace not only has to probe for mmap netlink
> support, it also has to implement a recv/socket receive path in order to
> handle messages that exceed the size of an rx ring element 
> (NL_MMAP_STATUS_COPY).
>
> So if there are odd programs out there that attempt to use MMAP netlink
> they should continue to work as they already need a socket based code path
> to work properly.
>
> The actual revert (first patch) has a list of problems.
> The followup patches remove a couple of helpers that are no longer needed
> after the revert.
>
> I did a few tests with mmap vs. socket based interface on a 4.4 based
> kernel on an i7-4790 box and there are no performance advantages:

Did you also test smaller devices?
i.e. stuff one would use for cheap routers.

-- 
Thanks,
//richard


Re: Computer fails to resume from suspend unless I rmmod jme before initiating the suspend

2016-02-14 Thread Richard Weinberger
Diego,

On Sun, Feb 14, 2016 at 9:16 PM, Diego Viola  wrote:
> Can someone please help?

as I wrote on IRC, please wait at least a full week.

-- 
Thanks,
//richard


[PATCH 13/22] net: Fix dependencies for !HAS_IOMEM archs

2016-01-25 Thread Richard Weinberger
Not every arch has io memory.
So, unbreak the build by fixing the dependencies.

Signed-off-by: Richard Weinberger <rich...@nod.at>
---
 drivers/net/ethernet/ezchip/Kconfig | 1 +
 drivers/net/phy/Kconfig | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/ezchip/Kconfig 
b/drivers/net/ethernet/ezchip/Kconfig
index 48ecbc8..b423ad3 100644
--- a/drivers/net/ethernet/ezchip/Kconfig
+++ b/drivers/net/ethernet/ezchip/Kconfig
@@ -18,6 +18,7 @@ if NET_VENDOR_EZCHIP
 config EZCHIP_NPS_MANAGEMENT_ENET
tristate "EZchip NPS management enet support"
depends on OF_IRQ && OF_NET
+   depends on HAS_IOMEM
---help---
  Simple LAN device for debug or management purposes.
  Device supports interrupts for RX and TX(completion).
diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index 60994a8..f0a7702 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -186,6 +186,7 @@ config MDIO_GPIO
 config MDIO_OCTEON
tristate "Support for MDIO buses on Octeon and ThunderX SOCs"
depends on 64BIT
+   depends on HAS_IOMEM
help
 
  This module provides a driver for the Octeon and ThunderX MDIO
-- 
1.8.4.5



Re: [PATCH net] bridge: Only call /sbin/bridge-stp for the initial network namespace

2015-11-30 Thread Richard Weinberger
Am 30.11.2015 um 22:38 schrieb Eric W. Biederman:
> 
> There is no defined mechanism to pass network namespace information
> into /sbin/bridge-stp therefore don't even try to invoke it except
> for bridge devices in the initial network namespace.
> 
> It is possible for unprivileged users to cause /sbin/bridge-stp to be
> invoked for any network device name which if /sbin/bridge-stp does not
> guard against unreasonable arguments or being invoked twice on the same
> network device could cause problems.
> 
> Signed-off-by: "Eric W. Biederman" 

Just figured that /sbin/bridge-stp is a shell script.
Network interfaces can contain a lot of funny characters,
maybe this is after all a security issue.

Thanks,
//richard

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: user controllable usermodehelper in br_stp_if.c

2015-11-30 Thread Richard Weinberger
Am 30.11.2015 um 21:14 schrieb Kees Cook:
> On Sun, Nov 29, 2015 at 2:43 PM, Richard Weinberger <rich...@nod.at> wrote:
>> Hi!
>>
>> By spawning new network and user namesapces an unprivileged user
>> is able to execute /sbin/bridge-stp within the initial mount namespace
>> with global root rights.
>> While this cannot directly be used to break out of a container or gain
>> global root rights it could be used by exploit writers as valuable building 
>> block.
>>
>> e.g.
>> $ unshare -U -r -n /bin/sh
>> $ brctl addbr br0
>> $ brctl stp br0 on # this will execute /sbin/bridge-stp
>>
>> As this mechanism clearly cannot work with containers and seems to be legacy 
>> code
>> I suggest not calling call_usermodehelper() at all if we're not in the 
>> initial user namespace.
>> What do you think?
> 
> I'm not familiar with how bridge-stp is expected to operate with a
> network namespace, but if it's meaningless, then yeah, that seems like
> a reasonable change. Can you send a patch? (Also, if it's legacy code,
> maybe it could be turned off entirely, not just for containers?)

Eric was faster than me. :-)

BTW: kernel.core_pattern is also worth a look.
If the pipe mode is used, "|/bin/core_tool", it will be executed in the
initial namespace and any user/container can trigger it.
Shayan reported that some weeks ago: https://lkml.org/lkml/2015/10/24/134

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


user controllable usermodehelper in br_stp_if.c

2015-11-29 Thread Richard Weinberger
Hi!

By spawning new network and user namesapces an unprivileged user
is able to execute /sbin/bridge-stp within the initial mount namespace
with global root rights.
While this cannot directly be used to break out of a container or gain
global root rights it could be used by exploit writers as valuable building 
block.

e.g.
$ unshare -U -r -n /bin/sh
$ brctl addbr br0
$ brctl stp br0 on # this will execute /sbin/bridge-stp

As this mechanism clearly cannot work with containers and seems to be legacy 
code
I suggest not calling call_usermodehelper() at all if we're not in the initial 
user namespace.
What do you think?

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bridge-utils: wrong sysfs path odds

2015-11-25 Thread Richard Weinberger
Am 25.11.2015 um 23:38 schrieb Florian Fainelli:
> On 25/11/15 01:21, Richard Weinberger wrote:
>> Am 25.11.2015 um 01:37 schrieb Stephen Hemminger:
>>> On Wed, 25 Nov 2015 01:24:47 +0100
>>> Richard Weinberger <rich...@nod.at> wrote:
>>>
>>>> Am 25.11.2015 um 01:15 schrieb Richard Weinberger:
>>>>> Hi!
>>>>>
>>>>> Today I was hunting down an issue where "brctl stp br0 off"
>>>>> always failed on mips64be with n32 userland.
>>>>>
>>>>> It turned out that the ioctl(fd, SIOCDEVPRIVATE, ) with 
>>>>> BRCTL_SET_BRIDGE_STP_STATE
>>>>> returned -EOPNOTSUPP.
>>>>> First I thought that this is a plain ABI issue on mips as in 
>>>>> old_dev_ioctl()
>>>>> the ioctl() argument was 0x1 instead of the expected 
>>>>> BRCTL_SET_BRIDGE_STP_STATE (0x14)
>>>>
>>>> Should be 0xe and not 0x14. It is 14 in decimal. :)
>>>>
>>>> Thanks,
>>>> //richard
>>>
>>> Ask Debian maintainer to send his patches, I don't go patch hunting.
>>>
>>
>> While looking what other distros do I came across this patch:
>> https://pkgs.fedoraproject.org/cgit/bridge-utils.git/tree/bridge-utils-1.5-check-error-returns-from-write-to-sysfs.patch
>>
>> Beside of checking return errors is fixes also the sysfs path in br_set().
>> Can you please merge it upstream?
>>
>> Distros seems to carry more patches for that package, if it helps I can do 
>> the patch hunting for you.
>> It would be nice to have a recent bridge-utils release. The last one is from 
>> 2011.
> 
> Most of what bridge-utils does can be done by iproute2's bridge
> sub-command FWIW.

Sure, but a lot of userspace still depends on brctl.
And IMHO brctl is magnitudes easier to use than iproute's bridge tool.

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bridge-utils: wrong sysfs path odds

2015-11-25 Thread Richard Weinberger
Hi!

Am 25.11.2015 um 23:30 schrieb Santiago Garcia Mantinan:
> Hi!
> 
>> Ask Debian maintainer to send his patches, I don't go patch hunting.
> 
> 
> While looking what other distros do I came across this patch:
> 
> https://pkgs.fedoraproject.org/cgit/bridge-utils.git/tree/bridge-utils-1.5-check-error-returns-from-write-to-sysfs.patch
> 
> 
> Don't know about the fedora guys but at least I did send some patches which 
> include this /bridge addition, and Stephen did even ack it as you can read on:
> http://lists.linuxfoundation.org/pipermail/bridge/2011-May/007646.html

Meanwhile I've found 
https://git.kernel.org/cgit/linux/kernel/git/shemminger/bridge-utils.git/
It contains some fixes.
So, the problem is that we have fixes but no release which contains these.
Stephen, can you do a maintenance release? Maybe 1.5.1?

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bridge-utils: wrong sysfs path odds

2015-11-25 Thread Richard Weinberger
Am 25.11.2015 um 01:37 schrieb Stephen Hemminger:
> On Wed, 25 Nov 2015 01:24:47 +0100
> Richard Weinberger <rich...@nod.at> wrote:
> 
>> Am 25.11.2015 um 01:15 schrieb Richard Weinberger:
>>> Hi!
>>>
>>> Today I was hunting down an issue where "brctl stp br0 off"
>>> always failed on mips64be with n32 userland.
>>>
>>> It turned out that the ioctl(fd, SIOCDEVPRIVATE, ) with 
>>> BRCTL_SET_BRIDGE_STP_STATE
>>> returned -EOPNOTSUPP.
>>> First I thought that this is a plain ABI issue on mips as in old_dev_ioctl()
>>> the ioctl() argument was 0x1 instead of the expected 
>>> BRCTL_SET_BRIDGE_STP_STATE (0x14)
>>
>> Should be 0xe and not 0x14. It is 14 in decimal. :)
>>
>> Thanks,
>> //richard
> 
> Ask Debian maintainer to send his patches, I don't go patch hunting.
> 

While looking what other distros do I came across this patch:
https://pkgs.fedoraproject.org/cgit/bridge-utils.git/tree/bridge-utils-1.5-check-error-returns-from-write-to-sysfs.patch

Beside of checking return errors is fixes also the sysfs path in br_set().
Can you please merge it upstream?

Distros seems to carry more patches for that package, if it helps I can do the 
patch hunting for you.
It would be nice to have a recent bridge-utils release. The last one is from 
2011.

Thanks,
//richard

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bridge-utils: wrong sysfs path odds

2015-11-24 Thread Richard Weinberger
Am 25.11.2015 um 01:37 schrieb Stephen Hemminger:
> On Wed, 25 Nov 2015 01:24:47 +0100
> Richard Weinberger <rich...@nod.at> wrote:
> 
>> Am 25.11.2015 um 01:15 schrieb Richard Weinberger:
>>> Hi!
>>>
>>> Today I was hunting down an issue where "brctl stp br0 off"
>>> always failed on mips64be with n32 userland.
>>>
>>> It turned out that the ioctl(fd, SIOCDEVPRIVATE, ) with 
>>> BRCTL_SET_BRIDGE_STP_STATE
>>> returned -EOPNOTSUPP.
>>> First I thought that this is a plain ABI issue on mips as in old_dev_ioctl()
>>> the ioctl() argument was 0x1 instead of the expected 
>>> BRCTL_SET_BRIDGE_STP_STATE (0x14)
>>
>> Should be 0xe and not 0x14. It is 14 in decimal. :)
>>
>> Thanks,
>> //richard
> 
> Ask Debian maintainer to send his patches, I don't go patch hunting.

He is Cc'ed :-)

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


bridge-utils: wrong sysfs path odds

2015-11-24 Thread Richard Weinberger
Hi!

Today I was hunting down an issue where "brctl stp br0 off"
always failed on mips64be with n32 userland.

It turned out that the ioctl(fd, SIOCDEVPRIVATE, ) with 
BRCTL_SET_BRIDGE_STP_STATE
returned -EOPNOTSUPP.
First I thought that this is a plain ABI issue on mips as in old_dev_ioctl()
the ioctl() argument was 0x1 instead of the expected BRCTL_SET_BRIDGE_STP_STATE 
(0x14)

Further investigation showed that brctl first tries to open the sysfs file
"/sys/class/net/br0/stp_state" and falls back to the legacy ioctl() upon 
failure.

On my mips setup old_dev_ioctl() seems not to work. And the function's comment
is correct:
/*
 * Legacy ioctl's through SIOCDEVPRIVATE
 * This interface is deprecated because it was too difficult to
 * to do the translation for 32/64bit ioctl compatibility.
 */

Later I've realized that the sysfs path is wrong, the "bridge/" directory
part is missing.
On most setups nobody would notice as the fallback ioctl() works.

Debian's bridge-utils package carries a patch which fixes the sysfs paths.
Can we please have this patch also in upstream bridge-utils?

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bridge-utils: wrong sysfs path odds

2015-11-24 Thread Richard Weinberger
Am 25.11.2015 um 01:15 schrieb Richard Weinberger:
> Hi!
> 
> Today I was hunting down an issue where "brctl stp br0 off"
> always failed on mips64be with n32 userland.
> 
> It turned out that the ioctl(fd, SIOCDEVPRIVATE, ) with 
> BRCTL_SET_BRIDGE_STP_STATE
> returned -EOPNOTSUPP.
> First I thought that this is a plain ABI issue on mips as in old_dev_ioctl()
> the ioctl() argument was 0x1 instead of the expected 
> BRCTL_SET_BRIDGE_STP_STATE (0x14)

Should be 0xe and not 0x14. It is 14 in decimal. :)

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] hyperv: Add handler for RNDIS_STATUS_NETWORK_CHANGE event

2015-10-30 Thread Richard Weinberger
Am 30.10.2015 um 23:03 schrieb Haiyang Zhang:
> 
> 
>> -Original Message-
>> From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com]
>> Sent: Friday, October 30, 2015 6:56 AM
>> To: Haiyang Zhang <haiya...@microsoft.com>
>> Cc: Richard Weinberger <richard.weinber...@gmail.com>; David Miller
>> <da...@davemloft.net>; o...@aepfle.de; jasow...@redhat.com; driverdev-
>> de...@linuxdriverproject.org; LKML <linux-ker...@vger.kernel.org>;
>> netdev@vger.kernel.org
>> Subject: Re: [PATCH net-next] hyperv: Add handler for
>> RNDIS_STATUS_NETWORK_CHANGE event
>>
>> Haiyang Zhang <haiya...@microsoft.com> writes:
>>
>>>> -Original Message-
>>>> From: Richard Weinberger [mailto:richard.weinber...@gmail.com]
>>>> Sent: Tuesday, October 27, 2015 6:36 PM
>>>> To: David Miller <da...@davemloft.net>
>>>> Cc: Haiyang Zhang <haiya...@microsoft.com>; o...@aepfle.de; Greg
>> Kroah-
>>>> Hartman <g...@kroah.com>; netdev@vger.kernel.org; jasow...@redhat.com;
>>>> driverdev-de...@linuxdriverproject.org; LKML >>> ker...@vger.kernel.org>
>>>> Subject: Re: [PATCH net-next] hyperv: Add handler for
>>>> RNDIS_STATUS_NETWORK_CHANGE event
>>>>
>>>> On Mon, Jun 23, 2014 at 10:10 PM, David Miller <da...@davemloft.net>
>>>> wrote:
>>>>> From: Haiyang Zhang <haiya...@microsoft.com>
>>>>> Date: Mon, 23 Jun 2014 16:09:59 +
>>>>>
>>>>>> So, what's the equivalent or similar command to "network restart"
>> on
>>>> SLES12? Could
>>>>>> you update the command line for the usermodehelper when porting
>> this
>>>> patch to SLES
>>>>>> 12?
>>>>>
>>>>> No, you are not going to keep the usermodehelper invocation in your
>>>> driver
>>>>> please remove it.  It is absolutely inappropriate, and I strictly
>> do
>>>> not want
>>>>> to keep it in there because other people will copy it and then
>> we'll
>>>> have a
>>>>> real mess on our hands.
>>>>
>>>> Sorry for digging up this old thread.
>>>> While talking with some guys about usermodehelper abuses I came
>> across
>>>> this gem.
>>>> Mainline still contains that "/etc/init.d/network restart" code.
>>>> Haiyang, care to cleanup?
>>>
>>> Hi Richard and others,
>>>
>>> Thanks for the reminder. I will clean up the usermode helper.
>>>
>>> Do you have suggestions of trigger DHCP refresh from kernel mode? Any
>>> sample code in the existing kernel code?
>>>
>>
>> I think it's wrong to call dhcp refresh from kernel. What happens when
>> we reconnect normal hardware adapter to another network? Link goes down
>> and then up and userspace is supposed to react accordingly. I think we
>> should emulate something similar for RNDIS_STATUS_NETWORK_CHANGE.
> 
> When link is down physically for a few seconds, the DHCP will automatically 
> refresh. I will add code to emulate this. There were some discussions around 
> this and other possibilities previously... I agree emulating what happens 
> with physically plug/unplug a cable is a reasonable way to trigger the DHCP 
> refresh.

Can't you propagate the event to userspace and let it take an appropriate
action?

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] hyperv: Add handler for RNDIS_STATUS_NETWORK_CHANGE event

2015-10-27 Thread Richard Weinberger
On Mon, Jun 23, 2014 at 10:10 PM, David Miller  wrote:
> From: Haiyang Zhang 
> Date: Mon, 23 Jun 2014 16:09:59 +
>
>> So, what's the equivalent or similar command to "network restart" on SLES12? 
>> Could
>> you update the command line for the usermodehelper when porting this patch 
>> to SLES
>> 12?
>
> No, you are not going to keep the usermodehelper invocation in your driver
> please remove it.  It is absolutely inappropriate, and I strictly do not want
> to keep it in there because other people will copy it and then we'll have a
> real mess on our hands.

Sorry for digging up this old thread.
While talking with some guys about usermodehelper abuses I came across this gem.
Mainline still contains that "/etc/init.d/network restart" code.
Haiyang, care to cleanup?

-- 
Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Intel-wired-lan] [PATCH 1/1] igb: Use ARRAY_SIZE instead fo sizeof(a)/sizeof(a[0])

2015-06-30 Thread Richard Weinberger
Hi!

Am 30.06.2015 um 22:16 schrieb Fujinaka, Todd:
 Sorry for the top-posting, but I'm provided with the tools they give me and 
 bottom posting from Outlook just confuses email threads. Plus, this was 
 crossposted all over creation and cc-ed to anyone with an intel address.
 
 I still would say no if I'm allowed, because to guarantee that this change - 
 that I don't think fixes anything - works in all cases, we need to do an 
 incredible amount of regression testing. Every variant of every Intel part 
 that uses this driver (and there are many) should be tested and will end up 
 being used by the community.
 
 Plus, you have no idea the number of obscure bugs I have to deal with as the 
 guy answering customer questions. If this triggers some odd embedded compiler 
 bug, I'm going to have to dig it out. Unless there is an actual bug, I'd like 
 to leave it as it is.

If you don't dare to touch your driver please update it's maintenance status.
Supported is definitely not the case, maybe Odd fixes would fit better.

Thanks,
//richard
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Intel-wired-lan] [PATCH 1/1] igb: Use ARRAY_SIZE instead fo sizeof(a)/sizeof(a[0])

2015-06-30 Thread Richard Weinberger
On Tue, Jun 30, 2015 at 4:53 PM, Fujinaka, Todd todd.fujin...@intel.com wrote:
 I don't see the reason this is needed so I'm going to say NAK.

Using generic functions is always better than open coded stuff.
Linux's ARRAY_SIZE also makes sure that the passed variable is actually
an array.

-- 
Thanks,
//richard
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] lib: test_bpf: purge CPP register redefinitions

2015-06-22 Thread Richard Weinberger
Am 22.06.2015 um 08:52 schrieb Richard Weinberger:
 Am 22.06.2015 um 08:05 schrieb Alexei Starovoitov:
 to get rid of warning you proposing to do 1k line renames?!
 Just add:
 +#undef R8
 +#undef R9
 +#undef R10
  #define R0 BPF_REG_0
 
 This would be also just another hack.
 
 Though I think the better fix woud be to clean up:
 arch/x86/include/uapi/asm/ptrace-abi.h
 What's the point of:
 #define R8 72
 from 'uapi' point of view?
 
 To query cpu registers using ptrace(2).
 
 Look like kernel details that shouldn't be exposed in uapi.
 
 These are not kernel details.
 
 Actually the problem is the other way around.
 UML is Linux ported to it's own userspace ABI.
 Hence, the arch/um and arch/x86/um use uapi header files.
 
 Maybe we can rework UML's header files such that
 no uapi header pollutes the kernel namespace.

While riding the bus to my office I've materialized that idea.
Nicolai, can you please give the attached patch a try?

Thanks,
//richard
diff --git a/arch/um/include/asm/ptrace-generic.h b/arch/um/include/asm/ptrace-generic.h
index cb9b3c4..7485162 100644
--- a/arch/um/include/asm/ptrace-generic.h
+++ b/arch/um/include/asm/ptrace-generic.h
@@ -8,7 +8,6 @@
 
 #ifndef __ASSEMBLY__
 
-#include asm/ptrace-abi.h
 #include sysdep/ptrace.h
 
 struct pt_regs {
diff --git a/arch/um/kernel/ptrace.c b/arch/um/kernel/ptrace.c
index 174ee50..7a0b2d9 100644
--- a/arch/um/kernel/ptrace.c
+++ b/arch/um/kernel/ptrace.c
@@ -8,6 +8,7 @@
 #include linux/sched.h
 #include linux/tracehook.h
 #include asm/uaccess.h
+#include asm/ptrace-abi.h
 
 void user_enable_single_step(struct task_struct *child)
 {
diff --git a/arch/x86/um/ptrace_32.c b/arch/x86/um/ptrace_32.c
index ce3dd4f..a29756f 100644
--- a/arch/x86/um/ptrace_32.c
+++ b/arch/x86/um/ptrace_32.c
@@ -6,6 +6,7 @@
 #include linux/mm.h
 #include linux/sched.h
 #include asm/uaccess.h
+#include asm/ptrace-abi.h
 #include skas.h
 
 extern int arch_switch_tls(struct task_struct *to);
diff --git a/arch/x86/um/ptrace_64.c b/arch/x86/um/ptrace_64.c
index 3b52bf0..a629694 100644
--- a/arch/x86/um/ptrace_64.c
+++ b/arch/x86/um/ptrace_64.c
@@ -11,6 +11,7 @@
 #define __FRAME_OFFSETS
 #include asm/ptrace.h
 #include asm/uaccess.h
+#include asm/ptrace-abi.h
 
 /*
  * determines which flags the user has access to.
diff --git a/arch/x86/um/tls_32.c b/arch/x86/um/tls_32.c
index 80ffa5b..48e3858 100644
--- a/arch/x86/um/tls_32.c
+++ b/arch/x86/um/tls_32.c
@@ -7,6 +7,7 @@
 #include linux/sched.h
 #include linux/syscalls.h
 #include asm/uaccess.h
+#include asm/ptrace-abi.h
 #include os.h
 #include skas.h
 #include sysdep/tls.h
diff --git a/arch/x86/um/tls_64.c b/arch/x86/um/tls_64.c
index d22363c..3ad7143 100644
--- a/arch/x86/um/tls_64.c
+++ b/arch/x86/um/tls_64.c
@@ -1,4 +1,5 @@
 #include linux/sched.h
+#include asm/ptrace-abi.h
 
 void clear_flushed_tls(struct task_struct *task)
 {


Re: [PATCH] lib: test_bpf: purge CPP register redefinitions

2015-06-22 Thread Richard Weinberger
Am 22.06.2015 um 08:05 schrieb Alexei Starovoitov:
 to get rid of warning you proposing to do 1k line renames?!
 Just add:
 +#undef R8
 +#undef R9
 +#undef R10
  #define R0 BPF_REG_0

This would be also just another hack.

 Though I think the better fix woud be to clean up:
 arch/x86/include/uapi/asm/ptrace-abi.h
 What's the point of:
 #define R8 72
 from 'uapi' point of view?

To query cpu registers using ptrace(2).

 Look like kernel details that shouldn't be exposed in uapi.

These are not kernel details.

Actually the problem is the other way around.
UML is Linux ported to it's own userspace ABI.
Hence, the arch/um and arch/x86/um use uapi header files.

Maybe we can rework UML's header files such that
no uapi header pollutes the kernel namespace.

That said, lib/test_bpf.c should still not use
defines like R8 as such symbols are very generic.

Thanks,
//richard
--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: kmalloc panic

2015-05-28 Thread Richard Weinberger
On Thu, May 28, 2015 at 9:21 AM, pavani
pavani.muthy...@redpinesignals.com wrote:
 Hi Cong ,

 Thanks for the response.

 Where we need to fix the bug ?I mean in the driver or kernel source code or
 hardware level.

The more interesting question is, is this a recent and pristine kernel
from kernel.org?

-- 
Thanks,
//richard
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v3 00/10] an introduction of library operating system for Linux (LibOS)

2015-04-24 Thread Richard Weinberger
Hi!

Am 24.04.2015 um 10:22 schrieb Hajime Tazaki:
 You *really* need to shape up wrt the build process.
 
 at the moment, the implementation of libos can't automate to
 follow such changes in the build process. but good news is
 it's a trivial task to follow up the latest function.
 
 my observation on this manual follow up since around 3.7
 kernel (2.5 yrs ago) is that these changes mostly happened
 during merge-window of each new version, and the fix only
 takes a couple of hours at maximum.
 
 I think I can survive with these changes but I'd like to ask
 broader opinions.
 
 
 one more question:
 
 I'd really like to have a suggestion on which tree I should
 base for libos tree.
 
 I'm proposing a patchset to arnd/asm-generic tree (which I
 believe the base tree for new arch/), while the patchset is
 tested with davem/net-next tree because right now libos is
 only for net/.
 
 shall I propose a patchset based on Linus' tree instead ?

I'd suggest the following:
Maintain LibOS in your git tree and follow Linus' tree.
Make sure that all kernel releases build and work.

This way you can experiment with automation and other
stuff. If it works well you can ask for mainline inclusion
after a few kernel releases.

Your git history will show how much maintenance burden
LibOS has and how much with every merge window breaks and
needs manual fixup.

Thanks,
//richard
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v3 00/10] an introduction of library operating system for Linux (LibOS)

2015-04-24 Thread Richard Weinberger
Hi!

Am 19.04.2015 um 15:28 schrieb Hajime Tazaki:
 changes from v2:
 - Patch 02/11 (slab: add private memory allocator header for arch/lib)
 * add new allocator named SLIB (Library Allocator): Patch 04/11 is integrated
   to 02 (commented by Christoph Lameter)
 - Overall
 * rewrite commit log messages
 
 changes from v1:
 - Patch 01/11 (sysctl: make some functions unstatic to access by arch/lib):
 * add prefix ctl_table_ to newly publiced functions (commented by Joe Perches)
 - Patch 08/11 (lib: other kernel glue layer code):
 * significantly reduce glue codes (stubs) (commented by Richard Weinberger)
 - Others
 * adapt to linux-4.0.0
 * detect make dependency by Kbuild .cmd files

I still fail to build it. :-(

for-asm-upstream-v3 on top of Linus' tree gives:

rw@sandpuppy:~/linux (libos $) make library ARCH=lib
  OBJS-MK   arch/lib/objs.mk
arch/lib/Makefile.print:41: target 'lzo/' given more than once in the same rule.
make[2]: Nothing to be done for '.config'.
scripts/kconfig/conf  --silentoldconfig arch/lib/Kconfig
  CHK include/config/kernel.release
  CHK include/generated/utsrelease.h
  CHK include/generated/uapi/linux/version.h
  CHK include/generated/compile.h
  GEN arch/lib/timeconst.h
  GEN arch/lib/linker.lds
  CC  arch/lib/lib.o
  CC  arch/lib/lib-device.o
  CC  arch/lib/lib-socket.o
  CC  arch/lib/random.o
  CC  arch/lib/softirq.o
  CC  arch/lib/time.o
  CC  arch/lib/timer.o
  CC  arch/lib/hrtimer.o
  CC  arch/lib/sched.o
  CC  arch/lib/workqueue.o
  CC  arch/lib/print.o
  CC  arch/lib/tasklet.o
  CC  arch/lib/tasklet-hrtimer.o
  CC  arch/lib/glue.o
  CC  arch/lib/fs.o
  CC  arch/lib/sysctl.o
  CC  arch/lib/proc.o
  CC  arch/lib/sysfs.o
  CC  arch/lib/capability.o
arch/lib/capability.c:16:6: error: redefinition of ‘capable’
 bool capable(int cap)
  ^
In file included from arch/lib/capability.c:9:0:
./include/linux/capability.h:236:20: note: previous definition of ‘capable’ was 
here
 static inline bool capable(int cap)
^
arch/lib/capability.c:39:6: error: redefinition of ‘ns_capable’
 bool ns_capable(struct user_namespace *ns, int cap)
  ^
In file included from arch/lib/capability.c:9:0:
./include/linux/capability.h:240:20: note: previous definition of ‘ns_capable’ 
was here
 static inline bool ns_capable(struct user_namespace *ns, int cap)
^
arch/lib/Makefile:210: recipe for target 'arch/lib/capability.o' failed
make: *** [arch/lib/capability.o] Error 1

And on top of v4.0 it fails too:

rw@sandpuppy:~/linux (libos-v4.0 $) make library ARCH=lib
  OBJS-MK   arch/lib/objs.mk
arch/lib/Makefile.print:41: target 'lzo/' given more than once in the same rule.
make[2]: Nothing to be done for '.config'.
scripts/kconfig/conf --silentoldconfig arch/lib/Kconfig
  CHK include/config/kernel.release
  CHK include/generated/utsrelease.h
  CHK include/generated/uapi/linux/version.h
  CHK include/generated/compile.h
  GEN arch/lib/timeconst.h
  GEN arch/lib/linker.lds
  CC  arch/lib/lib.o
  CC  arch/lib/lib-device.o
  CC  arch/lib/lib-socket.o
arch/lib/lib-socket.c: In function ‘lib_sock_sendmsg’:
arch/lib/lib-socket.c:114:2: error: too few arguments to function ‘sock_sendmsg’
  retval = sock_sendmsg(kernel_socket, msg_sys);
  ^
In file included from arch/lib/lib-socket.c:12:0:
./include/linux/net.h:216:5: note: declared here
 int sock_sendmsg(struct socket *sock, struct msghdr *msg, size_t len);
 ^
arch/lib/Makefile:210: recipe for target 'arch/lib/lib-socket.o' failed
make: *** [arch/lib/lib-socket.o] Error 1

You *really* need to shape up wrt the build process.

Thanks,
//richard
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 02/11] slab: add private memory allocator header for arch/lib

2015-04-17 Thread Richard Weinberger
Am 17.04.2015 um 14:17 schrieb Christoph Lameter:
 On Fri, 17 Apr 2015, Hajime Tazaki wrote:
 
 add header includion for CONFIG_LIB to wrap kmalloc and co. This will
 bring malloc(3) based allocator used by arch/lib.
 
 Maybe add another allocator insteadl? SLLB which implements memory
 management using malloc()?

Yeah, that's a good idea.

Hajime, another question, do you really want a malloc/free backend?
I'm not a mm expert, but does malloc() behave exactly as the kernel
counter parts?

In UML we allocate a big file on the host side, mmap() it and give this mapping
to the kernel as physical memory such that any kernel allocator can work with 
it.

Thanks,
//richard
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 02/11] slab: add private memory allocator header for arch/lib

2015-04-17 Thread Richard Weinberger
Am 17.04.2015 um 17:02 schrieb Hajime Tazaki:
 
 Hi Christoph, Richard,
 
 At Fri, 17 Apr 2015 14:44:35 +0200,
 Richard Weinberger wrote:

 Am 17.04.2015 um 14:17 schrieb Christoph Lameter:
 On Fri, 17 Apr 2015, Hajime Tazaki wrote:

 add header includion for CONFIG_LIB to wrap kmalloc and co. This will
 bring malloc(3) based allocator used by arch/lib.

 Maybe add another allocator insteadl? SLLB which implements memory
 management using malloc()?

 Yeah, that's a good idea.
 
 first, my bad, I should be more precise on the commit message.
 
 the patch with 04/11 patch is used _not_ only malloc(3) but
 also any allocator registered by our entry API, lib_init().
 
 for NUSE case, we use malloc(3) but for DCE (ns-3) case, we
 use our own allocator, which manages the (virtual) process
 running on network simulator.
 
 if these externally configurable memory allocator are point
 of interest in Linux kernel, maybe adding another allocator
 into mm/ is interesting but I'm not sure. what do you think ?

This is the idea behind SLLB.

 btw, what does stand for SLLB ? (just curious)

SLUB is the unqueued SLAB and SLLB is the library SLAB. :D

 Hajime, another question, do you really want a malloc/free backend?
 I'm not a mm expert, but does malloc() behave exactly as the kernel
 counter parts?
 
 as stated above, A1) yes, we need our own allocator, and A2)
 yes as NUSE proofed, it behaves fine.

Okay.

 In UML we allocate a big file on the host side, mmap() it and give this 
 mapping
 to the kernel as physical memory such that any kernel allocator can work 
 with it.
 
 libos doesn't virtualize a physical memory but provide
 allocator functions returning memory block on a request
 instead.

Makes sense. I thought maybe it can help you reducing the code
footprint.

Thanks,
//richard
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html