Re: [PATCH net 0/3] Fix for BPF devmap percpu allocation splat

2017-10-18 Thread Daniel Borkmann

On 10/18/2017 03:25 PM, Tejun Heo wrote:

Hello, Daniel.

(cc'ing Dennis)

On Tue, Oct 17, 2017 at 04:55:51PM +0200, Daniel Borkmann wrote:

The set fixes a splat in devmap percpu allocation when we alloc
the flush bitmap. Patch 1 is a prerequisite for the fix in patch 2,
patch 1 is rather small, so if this could be routed via -net, for
example, with Tejun's Ack that would be good. Patch 3 gets rid of
remaining PCPU_MIN_UNIT_SIZE checks, which are percpu allocator
internals and should not be used.

Thanks!

Daniel Borkmann (3):
   mm, percpu: add support for __GFP_NOWARN flag


This looks fine.


Great, thanks!


   bpf: fix splat for illegal devmap percpu allocation
   bpf: do not test for PCPU_MIN_UNIT_SIZE before percpu allocations


These look okay too but if it helps percpu allocator can expose the
maximum size / alignment supported to take out the guessing game too.


At least from BPF side there's right now no infra for exposing
max possible alloc sizes for maps to e.g. user space as indication.
There are few users left in the tree, where it would make sense for
having some helpers though:

  arch/tile/kernel/setup.c:729:   if (size < PCPU_MIN_UNIT_SIZE)
  arch/tile/kernel/setup.c:730:   size = PCPU_MIN_UNIT_SIZE;
  drivers/net/ethernet/chelsio/libcxgb/libcxgb_ppm.c:346: unsigned int max = 
(PCPU_MIN_UNIT_SIZE - sizeof(*pools)) << 3;
  drivers/net/ethernet/chelsio/libcxgb/libcxgb_ppm.c:352: /* make sure per cpu 
pool fits into PCPU_MIN_UNIT_SIZE */
  drivers/scsi/libfc/fc_exch.c:2488:   /* reduce range so per cpu pool fits 
into PCPU_MIN_UNIT_SIZE pool */
  drivers/scsi/libfc/fc_exch.c:2489:  pool_exch_range = (PCPU_MIN_UNIT_SIZE 
- sizeof(*pool)) /


Also, the reason why PCPU_MIN_UNIT_SIZE is what it is is because
nobody needed anything bigger.  Increasing the size doesn't really
cost much at least on 64bit archs.  Is that something we want to be
considering?


For devmap (and cpumap) itself it wouldn't make sense. For per-cpu
hashtable we could indeed consider it in the future.

Thanks,
Daniel


Re: [PATCH net 0/3] Fix for BPF devmap percpu allocation splat

2017-10-17 Thread Daniel Borkmann

On 10/17/2017 05:03 PM, David Laight wrote:

From: Daniel Borkmann

Sent: 17 October 2017 15:56

The set fixes a splat in devmap percpu allocation when we alloc
the flush bitmap. Patch 1 is a prerequisite for the fix in patch 2,
patch 1 is rather small, so if this could be routed via -net, for
example, with Tejun's Ack that would be good. Patch 3 gets rid of
remaining PCPU_MIN_UNIT_SIZE checks, which are percpu allocator
internals and should not be used.


Does it make sense to allow the user program to try to allocate ever
smaller very large maps until it finds one that succeeds - thus
using up all the percpu space?

Or is this a 'root only' 'shoot self in foot' job?


It's root only although John still has a pending fix to be flushed
out for -net first in the next days to actually enforce that cap
(devmap is not in an official kernel yet at this point, so all good),
but apart from this, all map allocs in general are accounted for
as well.

Thanks,
Daniel


Re: [PATCH net 0/3] Fix for BPF devmap percpu allocation splat

2017-10-17 Thread Daniel Borkmann

On 10/17/2017 05:03 PM, David Laight wrote:

From: Daniel Borkmann

Sent: 17 October 2017 15:56

The set fixes a splat in devmap percpu allocation when we alloc
the flush bitmap. Patch 1 is a prerequisite for the fix in patch 2,
patch 1 is rather small, so if this could be routed via -net, for
example, with Tejun's Ack that would be good. Patch 3 gets rid of
remaining PCPU_MIN_UNIT_SIZE checks, which are percpu allocator
internals and should not be used.


Does it make sense to allow the user program to try to allocate ever
smaller very large maps until it finds one that succeeds - thus
using up all the percpu space?

Or is this a 'root only' 'shoot self in foot' job?


It's root only although John still has a pending fix to be flushed
out for -net first in the next days to actually enforce that cap
(devmap is not in an official kernel yet at this point, so all good),
but apart from this, all map allocs in general are accounted for
as well.

Thanks,
Daniel


[PATCH net 3/3] bpf: do not test for PCPU_MIN_UNIT_SIZE before percpu allocations

2017-10-17 Thread Daniel Borkmann
PCPU_MIN_UNIT_SIZE is an implementation detail of the percpu
allocator. Given we support __GFP_NOWARN now, lets just let
the allocation request fail naturally instead. The two call
sites from BPF mistakenly assumed __GFP_NOWARN would work, so
no changes needed to their actual __alloc_percpu_gfp() calls
which use the flag already.

Signed-off-by: Daniel Borkmann <dan...@iogearbox.net>
---
 kernel/bpf/arraymap.c | 2 +-
 kernel/bpf/hashtab.c  | 4 
 2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 98c0f00..e263673 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -98,7 +98,7 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
array_size += (u64) attr->max_entries * elem_size * num_possible_cpus();
 
if (array_size >= U32_MAX - PAGE_SIZE ||
-   elem_size > PCPU_MIN_UNIT_SIZE || bpf_array_alloc_percpu(array)) {
+   bpf_array_alloc_percpu(array)) {
bpf_map_area_free(array);
return ERR_PTR(-ENOMEM);
}
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 431126f..6533f08 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -317,10 +317,6 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 */
goto free_htab;
 
-   if (percpu && round_up(htab->map.value_size, 8) > PCPU_MIN_UNIT_SIZE)
-   /* make sure the size for pcpu_alloc() is reasonable */
-   goto free_htab;
-
htab->elem_size = sizeof(struct htab_elem) +
  round_up(htab->map.key_size, 8);
if (percpu)
-- 
1.9.3



[PATCH net 3/3] bpf: do not test for PCPU_MIN_UNIT_SIZE before percpu allocations

2017-10-17 Thread Daniel Borkmann
PCPU_MIN_UNIT_SIZE is an implementation detail of the percpu
allocator. Given we support __GFP_NOWARN now, lets just let
the allocation request fail naturally instead. The two call
sites from BPF mistakenly assumed __GFP_NOWARN would work, so
no changes needed to their actual __alloc_percpu_gfp() calls
which use the flag already.

Signed-off-by: Daniel Borkmann 
---
 kernel/bpf/arraymap.c | 2 +-
 kernel/bpf/hashtab.c  | 4 
 2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 98c0f00..e263673 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -98,7 +98,7 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
array_size += (u64) attr->max_entries * elem_size * num_possible_cpus();
 
if (array_size >= U32_MAX - PAGE_SIZE ||
-   elem_size > PCPU_MIN_UNIT_SIZE || bpf_array_alloc_percpu(array)) {
+   bpf_array_alloc_percpu(array)) {
bpf_map_area_free(array);
return ERR_PTR(-ENOMEM);
}
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 431126f..6533f08 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -317,10 +317,6 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 */
goto free_htab;
 
-   if (percpu && round_up(htab->map.value_size, 8) > PCPU_MIN_UNIT_SIZE)
-   /* make sure the size for pcpu_alloc() is reasonable */
-   goto free_htab;
-
htab->elem_size = sizeof(struct htab_elem) +
  round_up(htab->map.key_size, 8);
if (percpu)
-- 
1.9.3



[PATCH net 0/3] Fix for BPF devmap percpu allocation splat

2017-10-17 Thread Daniel Borkmann
The set fixes a splat in devmap percpu allocation when we alloc
the flush bitmap. Patch 1 is a prerequisite for the fix in patch 2,
patch 1 is rather small, so if this could be routed via -net, for
example, with Tejun's Ack that would be good. Patch 3 gets rid of
remaining PCPU_MIN_UNIT_SIZE checks, which are percpu allocator
internals and should not be used.

Thanks!

Daniel Borkmann (3):
  mm, percpu: add support for __GFP_NOWARN flag
  bpf: fix splat for illegal devmap percpu allocation
  bpf: do not test for PCPU_MIN_UNIT_SIZE before percpu allocations

 kernel/bpf/arraymap.c |  2 +-
 kernel/bpf/devmap.c   |  5 +++--
 kernel/bpf/hashtab.c  |  4 
 mm/percpu.c   | 15 ++-
 4 files changed, 14 insertions(+), 12 deletions(-)

-- 
1.9.3



[PATCH net 0/3] Fix for BPF devmap percpu allocation splat

2017-10-17 Thread Daniel Borkmann
The set fixes a splat in devmap percpu allocation when we alloc
the flush bitmap. Patch 1 is a prerequisite for the fix in patch 2,
patch 1 is rather small, so if this could be routed via -net, for
example, with Tejun's Ack that would be good. Patch 3 gets rid of
remaining PCPU_MIN_UNIT_SIZE checks, which are percpu allocator
internals and should not be used.

Thanks!

Daniel Borkmann (3):
  mm, percpu: add support for __GFP_NOWARN flag
  bpf: fix splat for illegal devmap percpu allocation
  bpf: do not test for PCPU_MIN_UNIT_SIZE before percpu allocations

 kernel/bpf/arraymap.c |  2 +-
 kernel/bpf/devmap.c   |  5 +++--
 kernel/bpf/hashtab.c  |  4 
 mm/percpu.c   | 15 ++-
 4 files changed, 14 insertions(+), 12 deletions(-)

-- 
1.9.3



[PATCH net 1/3] mm, percpu: add support for __GFP_NOWARN flag

2017-10-17 Thread Daniel Borkmann
Add an option for pcpu_alloc() to support __GFP_NOWARN flag.
Currently, we always throw a warning when size or alignment
is unsupported (and also dump stack on failed allocation
requests). The warning itself is harmless since we return
NULL anyway for any failed request, which callers are
required to handle anyway. However, it becomes harmful when
panic_on_warn is set.

The rationale for the WARN() in pcpu_alloc() is that it can
be tracked when larger than supported allocation requests are
made such that allocations limits can be tweaked if warranted.
This makes sense for in-kernel users, however, there are users
of pcpu allocator where allocation size is derived from user
space requests, e.g. when creating BPF maps. In these cases,
the requests should fail gracefully without throwing a splat.

The current work-around was to check allocation size against
the upper limit of PCPU_MIN_UNIT_SIZE from call-sites for
bailing out prior to a call to pcpu_alloc() in order to
avoid throwing the WARN(). This is bad in multiple ways since
PCPU_MIN_UNIT_SIZE is an implementation detail, and having
the checks on call-sites only complicates the code for no
good reason. Thus, lets fix it generically by supporting the
__GFP_NOWARN flag that users can then use with calling the
__alloc_percpu_gfp() helper instead.

Signed-off-by: Daniel Borkmann <dan...@iogearbox.net>
Cc: Tejun Heo <t...@kernel.org>
Cc: Mark Rutland <mark.rutl...@arm.com>
---
 mm/percpu.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index aa121ce..a0e0c82 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1329,7 +1329,9 @@ static struct pcpu_chunk *pcpu_chunk_addr_search(void 
*addr)
  * @gfp: allocation flags
  *
  * Allocate percpu area of @size bytes aligned at @align.  If @gfp doesn't
- * contain %GFP_KERNEL, the allocation is atomic.
+ * contain %GFP_KERNEL, the allocation is atomic. If @gfp has __GFP_NOWARN
+ * then no warning will be triggered on invalid or failed allocation
+ * requests.
  *
  * RETURNS:
  * Percpu pointer to the allocated area on success, NULL on failure.
@@ -1337,10 +1339,11 @@ static struct pcpu_chunk *pcpu_chunk_addr_search(void 
*addr)
 static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
 gfp_t gfp)
 {
+   bool is_atomic = (gfp & GFP_KERNEL) != GFP_KERNEL;
+   bool do_warn = !(gfp & __GFP_NOWARN);
static int warn_limit = 10;
struct pcpu_chunk *chunk;
const char *err;
-   bool is_atomic = (gfp & GFP_KERNEL) != GFP_KERNEL;
int slot, off, cpu, ret;
unsigned long flags;
void __percpu *ptr;
@@ -1361,7 +1364,7 @@ static void __percpu *pcpu_alloc(size_t size, size_t 
align, bool reserved,
 
if (unlikely(!size || size > PCPU_MIN_UNIT_SIZE || align > PAGE_SIZE ||
 !is_power_of_2(align))) {
-   WARN(true, "illegal size (%zu) or align (%zu) for percpu 
allocation\n",
+   WARN(do_warn, "illegal size (%zu) or align (%zu) for percpu 
allocation\n",
 size, align);
return NULL;
}
@@ -1482,7 +1485,7 @@ static void __percpu *pcpu_alloc(size_t size, size_t 
align, bool reserved,
 fail:
trace_percpu_alloc_percpu_fail(reserved, is_atomic, size, align);
 
-   if (!is_atomic && warn_limit) {
+   if (!is_atomic && do_warn && warn_limit) {
pr_warn("allocation failed, size=%zu align=%zu atomic=%d, %s\n",
size, align, is_atomic, err);
dump_stack();
@@ -1507,7 +1510,9 @@ static void __percpu *pcpu_alloc(size_t size, size_t 
align, bool reserved,
  *
  * Allocate zero-filled percpu area of @size bytes aligned at @align.  If
  * @gfp doesn't contain %GFP_KERNEL, the allocation doesn't block and can
- * be called from any context but is a lot more likely to fail.
+ * be called from any context but is a lot more likely to fail. If @gfp
+ * has __GFP_NOWARN then no warning will be triggered on invalid or failed
+ * allocation requests.
  *
  * RETURNS:
  * Percpu pointer to the allocated area on success, NULL on failure.
-- 
1.9.3



[PATCH net 1/3] mm, percpu: add support for __GFP_NOWARN flag

2017-10-17 Thread Daniel Borkmann
Add an option for pcpu_alloc() to support __GFP_NOWARN flag.
Currently, we always throw a warning when size or alignment
is unsupported (and also dump stack on failed allocation
requests). The warning itself is harmless since we return
NULL anyway for any failed request, which callers are
required to handle anyway. However, it becomes harmful when
panic_on_warn is set.

The rationale for the WARN() in pcpu_alloc() is that it can
be tracked when larger than supported allocation requests are
made such that allocations limits can be tweaked if warranted.
This makes sense for in-kernel users, however, there are users
of pcpu allocator where allocation size is derived from user
space requests, e.g. when creating BPF maps. In these cases,
the requests should fail gracefully without throwing a splat.

The current work-around was to check allocation size against
the upper limit of PCPU_MIN_UNIT_SIZE from call-sites for
bailing out prior to a call to pcpu_alloc() in order to
avoid throwing the WARN(). This is bad in multiple ways since
PCPU_MIN_UNIT_SIZE is an implementation detail, and having
the checks on call-sites only complicates the code for no
good reason. Thus, lets fix it generically by supporting the
__GFP_NOWARN flag that users can then use with calling the
__alloc_percpu_gfp() helper instead.

Signed-off-by: Daniel Borkmann 
Cc: Tejun Heo 
Cc: Mark Rutland 
---
 mm/percpu.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index aa121ce..a0e0c82 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1329,7 +1329,9 @@ static struct pcpu_chunk *pcpu_chunk_addr_search(void 
*addr)
  * @gfp: allocation flags
  *
  * Allocate percpu area of @size bytes aligned at @align.  If @gfp doesn't
- * contain %GFP_KERNEL, the allocation is atomic.
+ * contain %GFP_KERNEL, the allocation is atomic. If @gfp has __GFP_NOWARN
+ * then no warning will be triggered on invalid or failed allocation
+ * requests.
  *
  * RETURNS:
  * Percpu pointer to the allocated area on success, NULL on failure.
@@ -1337,10 +1339,11 @@ static struct pcpu_chunk *pcpu_chunk_addr_search(void 
*addr)
 static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
 gfp_t gfp)
 {
+   bool is_atomic = (gfp & GFP_KERNEL) != GFP_KERNEL;
+   bool do_warn = !(gfp & __GFP_NOWARN);
static int warn_limit = 10;
struct pcpu_chunk *chunk;
const char *err;
-   bool is_atomic = (gfp & GFP_KERNEL) != GFP_KERNEL;
int slot, off, cpu, ret;
unsigned long flags;
void __percpu *ptr;
@@ -1361,7 +1364,7 @@ static void __percpu *pcpu_alloc(size_t size, size_t 
align, bool reserved,
 
if (unlikely(!size || size > PCPU_MIN_UNIT_SIZE || align > PAGE_SIZE ||
 !is_power_of_2(align))) {
-   WARN(true, "illegal size (%zu) or align (%zu) for percpu 
allocation\n",
+   WARN(do_warn, "illegal size (%zu) or align (%zu) for percpu 
allocation\n",
 size, align);
return NULL;
}
@@ -1482,7 +1485,7 @@ static void __percpu *pcpu_alloc(size_t size, size_t 
align, bool reserved,
 fail:
trace_percpu_alloc_percpu_fail(reserved, is_atomic, size, align);
 
-   if (!is_atomic && warn_limit) {
+   if (!is_atomic && do_warn && warn_limit) {
pr_warn("allocation failed, size=%zu align=%zu atomic=%d, %s\n",
size, align, is_atomic, err);
dump_stack();
@@ -1507,7 +1510,9 @@ static void __percpu *pcpu_alloc(size_t size, size_t 
align, bool reserved,
  *
  * Allocate zero-filled percpu area of @size bytes aligned at @align.  If
  * @gfp doesn't contain %GFP_KERNEL, the allocation doesn't block and can
- * be called from any context but is a lot more likely to fail.
+ * be called from any context but is a lot more likely to fail. If @gfp
+ * has __GFP_NOWARN then no warning will be triggered on invalid or failed
+ * allocation requests.
  *
  * RETURNS:
  * Percpu pointer to the allocated area on success, NULL on failure.
-- 
1.9.3



[PATCH net 2/3] bpf: fix splat for illegal devmap percpu allocation

2017-10-17 Thread Daniel Borkmann
It was reported that syzkaller was able to trigger a splat on
devmap percpu allocation due to illegal/unsupported allocation
request size passed to __alloc_percpu():

  [   70.094249] illegal size (32776) or align (8) for percpu allocation
  [   70.094256] [ cut here ]
  [   70.094259] WARNING: CPU: 3 PID: 3451 at mm/percpu.c:1365 
pcpu_alloc+0x96/0x630
  [...]
  [   70.094325] Call Trace:
  [   70.094328]  __alloc_percpu_gfp+0x12/0x20
  [   70.094330]  dev_map_alloc+0x134/0x1e0
  [   70.094331]  SyS_bpf+0x9bc/0x1610
  [   70.094333]  ? selinux_task_setrlimit+0x5a/0x60
  [   70.094334]  ? security_task_setrlimit+0x43/0x60
  [   70.094336]  entry_SYSCALL_64_fastpath+0x1a/0xa5

This was due to too large max_entries for the map such that we
surpassed the upper limit of PCPU_MIN_UNIT_SIZE. It's fine to
fail naturally here, so switch to __alloc_percpu_gfp() and pass
__GFP_NOWARN instead.

Fixes: 11393cc9b9be ("xdp: Add batching support to redirect map")
Reported-by: Mark Rutland <mark.rutl...@arm.com>
Reported-by: Shankara Pailoor <sp3...@columbia.edu>
Reported-by: Richard Weinberger <rich...@nod.at>
Signed-off-by: Daniel Borkmann <dan...@iogearbox.net>
Cc: John Fastabend <john.fastab...@gmail.com>
---
 kernel/bpf/devmap.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index e093d9a..920428d 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -111,8 +111,9 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
err = -ENOMEM;
 
/* A per cpu bitfield with a bit per possible net device */
-   dtab->flush_needed = __alloc_percpu(dev_map_bitmap_size(attr),
-   __alignof__(unsigned long));
+   dtab->flush_needed = __alloc_percpu_gfp(dev_map_bitmap_size(attr),
+   __alignof__(unsigned long),
+   GFP_KERNEL | __GFP_NOWARN);
if (!dtab->flush_needed)
goto free_dtab;
 
-- 
1.9.3



[PATCH net 2/3] bpf: fix splat for illegal devmap percpu allocation

2017-10-17 Thread Daniel Borkmann
It was reported that syzkaller was able to trigger a splat on
devmap percpu allocation due to illegal/unsupported allocation
request size passed to __alloc_percpu():

  [   70.094249] illegal size (32776) or align (8) for percpu allocation
  [   70.094256] [ cut here ]
  [   70.094259] WARNING: CPU: 3 PID: 3451 at mm/percpu.c:1365 
pcpu_alloc+0x96/0x630
  [...]
  [   70.094325] Call Trace:
  [   70.094328]  __alloc_percpu_gfp+0x12/0x20
  [   70.094330]  dev_map_alloc+0x134/0x1e0
  [   70.094331]  SyS_bpf+0x9bc/0x1610
  [   70.094333]  ? selinux_task_setrlimit+0x5a/0x60
  [   70.094334]  ? security_task_setrlimit+0x43/0x60
  [   70.094336]  entry_SYSCALL_64_fastpath+0x1a/0xa5

This was due to too large max_entries for the map such that we
surpassed the upper limit of PCPU_MIN_UNIT_SIZE. It's fine to
fail naturally here, so switch to __alloc_percpu_gfp() and pass
__GFP_NOWARN instead.

Fixes: 11393cc9b9be ("xdp: Add batching support to redirect map")
Reported-by: Mark Rutland 
Reported-by: Shankara Pailoor 
Reported-by: Richard Weinberger 
Signed-off-by: Daniel Borkmann 
Cc: John Fastabend 
---
 kernel/bpf/devmap.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index e093d9a..920428d 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -111,8 +111,9 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
err = -ENOMEM;
 
/* A per cpu bitfield with a bit per possible net device */
-   dtab->flush_needed = __alloc_percpu(dev_map_bitmap_size(attr),
-   __alignof__(unsigned long));
+   dtab->flush_needed = __alloc_percpu_gfp(dev_map_bitmap_size(attr),
+   __alignof__(unsigned long),
+   GFP_KERNEL | __GFP_NOWARN);
if (!dtab->flush_needed)
goto free_dtab;
 
-- 
1.9.3



Re: [PATCH] bpf: devmap: Check attr->max_entries more carefully

2017-10-17 Thread Daniel Borkmann

On 10/17/2017 12:29 PM, Mark Rutland wrote:

On Mon, Oct 16, 2017 at 08:52:13PM +0200, Daniel Borkmann wrote:

[ +Tejun, Mark, John ]

On 10/16/2017 12:00 AM, Richard Weinberger wrote:

max_entries is user controlled and used as input for __alloc_percpu().
This function expects that the allocation size is a power of two and
less than PCPU_MIN_UNIT_SIZE.
Otherwise a WARN() is triggered.

Fixes: 11393cc9b9be ("xdp: Add batching support to redirect map")
Reported-by: Shankara Pailoor <sp3...@columbia.edu>
Reported-by: syzkaller <syzkal...@googlegroups.com>
Signed-off-by: Richard Weinberger <rich...@nod.at>


Thanks for the patch, Richard. There was a prior discussion here [1] on
the same issue, I thought this would have been resolved by now, but looks
like it's still open and there was never a follow-up, at least I don't see
it in the percpu tree if I didn't miss anything.


Sorry, this was on my todo list, but I've been bogged down with some
other work.


Ok, no problem.


I would suggest, we do the following below and pass __GFP_NOWARN from BPF
side to the per-cpu allocs. This is kind of a generic 'issue' and we shouldn't
add more code which bails out anyway just to work around the WARN(). Lets
handle it properly instead.


Agreed. The below patch looks good to me, (with the suggested change to
the BPF side).


If Tejun is fine with the one below, I could cook and official patch and
cleanup the remaining call-sites from BPF which have similar pattern.


That would be great; thanks for taking this on.


I'll prepare a set including BPF side for today.

Thanks,
Daniel


Re: [PATCH] bpf: devmap: Check attr->max_entries more carefully

2017-10-17 Thread Daniel Borkmann

On 10/17/2017 12:29 PM, Mark Rutland wrote:

On Mon, Oct 16, 2017 at 08:52:13PM +0200, Daniel Borkmann wrote:

[ +Tejun, Mark, John ]

On 10/16/2017 12:00 AM, Richard Weinberger wrote:

max_entries is user controlled and used as input for __alloc_percpu().
This function expects that the allocation size is a power of two and
less than PCPU_MIN_UNIT_SIZE.
Otherwise a WARN() is triggered.

Fixes: 11393cc9b9be ("xdp: Add batching support to redirect map")
Reported-by: Shankara Pailoor 
Reported-by: syzkaller 
Signed-off-by: Richard Weinberger 


Thanks for the patch, Richard. There was a prior discussion here [1] on
the same issue, I thought this would have been resolved by now, but looks
like it's still open and there was never a follow-up, at least I don't see
it in the percpu tree if I didn't miss anything.


Sorry, this was on my todo list, but I've been bogged down with some
other work.


Ok, no problem.


I would suggest, we do the following below and pass __GFP_NOWARN from BPF
side to the per-cpu allocs. This is kind of a generic 'issue' and we shouldn't
add more code which bails out anyway just to work around the WARN(). Lets
handle it properly instead.


Agreed. The below patch looks good to me, (with the suggested change to
the BPF side).


If Tejun is fine with the one below, I could cook and official patch and
cleanup the remaining call-sites from BPF which have similar pattern.


That would be great; thanks for taking this on.


I'll prepare a set including BPF side for today.

Thanks,
Daniel


Re: [PATCH 3/3] bpf: Make sure that ->comm does not change under us.

2017-10-16 Thread Daniel Borkmann

On 10/17/2017 12:10 AM, Alexei Starovoitov wrote:

On Mon, Oct 16, 2017 at 2:10 PM, Richard Weinberger <rich...@nod.at> wrote:

Am Montag, 16. Oktober 2017, 23:02:06 CEST schrieb Daniel Borkmann:

On 10/16/2017 10:55 PM, Richard Weinberger wrote:

Am Montag, 16. Oktober 2017, 22:50:43 CEST schrieb Daniel Borkmann:

   struct task_struct *task = current;

+ task_lock(task);

   strncpy(buf, task->comm, size);

+ task_unlock(task);


Wouldn't this potentially lead to a deadlock? E.g. you attach yourself
to task_lock() / spin_lock() / etc, and then the BPF prog triggers the
bpf_get_current_comm() taking the lock again ...


Yes, but doesn't the same apply to the use case when I attach to strncpy()
and run bpf_get_current_comm()?


You mean due to recursion? In that case trace_call_bpf() would bail out
due to the bpf_prog_active counter.


Ah, that's true.
So, when someone wants to use bpf_get_current_comm() while tracing task_lock,
we have a problem. I agree.
On the other hand, without locking the function may return wrong results.


it will surely race with somebody else setting task comm and it's fine.
all of bpf tracing is read-only, so locks are only allowed inside bpf core
bits like maps. Taking core locks like task_lock() is quite scary.
bpf scripts rely on bpf_probe_read() of all sorts of kernel fields
so reading comm here w/o lock is fine.


Yeah, and perf_event_comm() -> perf_event_comm_event() out of __set_task_comm()
is having same approach wrt comm read-out.


Re: [PATCH 3/3] bpf: Make sure that ->comm does not change under us.

2017-10-16 Thread Daniel Borkmann

On 10/17/2017 12:10 AM, Alexei Starovoitov wrote:

On Mon, Oct 16, 2017 at 2:10 PM, Richard Weinberger  wrote:

Am Montag, 16. Oktober 2017, 23:02:06 CEST schrieb Daniel Borkmann:

On 10/16/2017 10:55 PM, Richard Weinberger wrote:

Am Montag, 16. Oktober 2017, 22:50:43 CEST schrieb Daniel Borkmann:

   struct task_struct *task = current;

+ task_lock(task);

   strncpy(buf, task->comm, size);

+ task_unlock(task);


Wouldn't this potentially lead to a deadlock? E.g. you attach yourself
to task_lock() / spin_lock() / etc, and then the BPF prog triggers the
bpf_get_current_comm() taking the lock again ...


Yes, but doesn't the same apply to the use case when I attach to strncpy()
and run bpf_get_current_comm()?


You mean due to recursion? In that case trace_call_bpf() would bail out
due to the bpf_prog_active counter.


Ah, that's true.
So, when someone wants to use bpf_get_current_comm() while tracing task_lock,
we have a problem. I agree.
On the other hand, without locking the function may return wrong results.


it will surely race with somebody else setting task comm and it's fine.
all of bpf tracing is read-only, so locks are only allowed inside bpf core
bits like maps. Taking core locks like task_lock() is quite scary.
bpf scripts rely on bpf_probe_read() of all sorts of kernel fields
so reading comm here w/o lock is fine.


Yeah, and perf_event_comm() -> perf_event_comm_event() out of __set_task_comm()
is having same approach wrt comm read-out.


Re: [PATCH 3/3] bpf: Make sure that ->comm does not change under us.

2017-10-16 Thread Daniel Borkmann

On 10/16/2017 10:55 PM, Richard Weinberger wrote:

Am Montag, 16. Oktober 2017, 22:50:43 CEST schrieb Daniel Borkmann:

struct task_struct *task = current;

+   task_lock(task);

strncpy(buf, task->comm, size);

+   task_unlock(task);


Wouldn't this potentially lead to a deadlock? E.g. you attach yourself
to task_lock() / spin_lock() / etc, and then the BPF prog triggers the
bpf_get_current_comm() taking the lock again ...


Yes, but doesn't the same apply to the use case when I attach to strncpy()
and run bpf_get_current_comm()?


You mean due to recursion? In that case trace_call_bpf() would bail out
due to the bpf_prog_active counter.


Re: [PATCH 3/3] bpf: Make sure that ->comm does not change under us.

2017-10-16 Thread Daniel Borkmann

On 10/16/2017 10:55 PM, Richard Weinberger wrote:

Am Montag, 16. Oktober 2017, 22:50:43 CEST schrieb Daniel Borkmann:

struct task_struct *task = current;

+   task_lock(task);

strncpy(buf, task->comm, size);

+   task_unlock(task);


Wouldn't this potentially lead to a deadlock? E.g. you attach yourself
to task_lock() / spin_lock() / etc, and then the BPF prog triggers the
bpf_get_current_comm() taking the lock again ...


Yes, but doesn't the same apply to the use case when I attach to strncpy()
and run bpf_get_current_comm()?


You mean due to recursion? In that case trace_call_bpf() would bail out
due to the bpf_prog_active counter.


Re: [PATCH 3/3] bpf: Make sure that ->comm does not change under us.

2017-10-16 Thread Daniel Borkmann

On 10/16/2017 08:18 PM, Richard Weinberger wrote:

Sadly we cannot use get_task_comm() since bpf_get_current_comm()
allows truncation.

Signed-off-by: Richard Weinberger 
---
  kernel/bpf/helpers.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 511c9d522cfc..4b042b24524d 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -18,6 +18,7 @@
  #include 
  #include 
  #include 
+#include 

  /* If kernel subsystem is allowing eBPF programs to call this function,
   * inside its own verifier_ops->get_func_proto() callback it should return
@@ -149,7 +150,9 @@ BPF_CALL_2(bpf_get_current_comm, char *, buf, u32, size)
  {
struct task_struct *task = current;

+   task_lock(task);
strncpy(buf, task->comm, size);
+   task_unlock(task);


Wouldn't this potentially lead to a deadlock? E.g. you attach yourself
to task_lock() / spin_lock() / etc, and then the BPF prog triggers the
bpf_get_current_comm() taking the lock again ...


/* Verifier guarantees that size > 0. For task->comm exceeding
 * size, guarantee that buf is %NUL-terminated. Unconditionally





Re: [PATCH 3/3] bpf: Make sure that ->comm does not change under us.

2017-10-16 Thread Daniel Borkmann

On 10/16/2017 08:18 PM, Richard Weinberger wrote:

Sadly we cannot use get_task_comm() since bpf_get_current_comm()
allows truncation.

Signed-off-by: Richard Weinberger 
---
  kernel/bpf/helpers.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 511c9d522cfc..4b042b24524d 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -18,6 +18,7 @@
  #include 
  #include 
  #include 
+#include 

  /* If kernel subsystem is allowing eBPF programs to call this function,
   * inside its own verifier_ops->get_func_proto() callback it should return
@@ -149,7 +150,9 @@ BPF_CALL_2(bpf_get_current_comm, char *, buf, u32, size)
  {
struct task_struct *task = current;

+   task_lock(task);
strncpy(buf, task->comm, size);
+   task_unlock(task);


Wouldn't this potentially lead to a deadlock? E.g. you attach yourself
to task_lock() / spin_lock() / etc, and then the BPF prog triggers the
bpf_get_current_comm() taking the lock again ...


/* Verifier guarantees that size > 0. For task->comm exceeding
 * size, guarantee that buf is %NUL-terminated. Unconditionally





Re: [PATCH 2/3] bpf: Remove dead variable

2017-10-16 Thread Daniel Borkmann

On 10/16/2017 09:22 PM, Richard Weinberger wrote:
[...]

So, I can happily squash 2/3 into 1/3 and resent.


Yeah, please just squash them.

Thanks,
Daniel


Re: [PATCH 2/3] bpf: Remove dead variable

2017-10-16 Thread Daniel Borkmann

On 10/16/2017 09:22 PM, Richard Weinberger wrote:
[...]

So, I can happily squash 2/3 into 1/3 and resent.


Yeah, please just squash them.

Thanks,
Daniel


Re: [PATCH 2/3] bpf: Remove dead variable

2017-10-16 Thread Daniel Borkmann

On 10/16/2017 08:59 PM, Richard Weinberger wrote:

Am Montag, 16. Oktober 2017, 20:54:36 CEST schrieb Daniel Borkmann:

On 10/16/2017 08:18 PM, Richard Weinberger wrote:

task is never used in bpf_get_current_uid_gid(), kill it.

Signed-off-by: Richard Weinberger <rich...@nod.at>
---

   kernel/bpf/helpers.c | 1 -
   1 file changed, 1 deletion(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index e8845adcd15e..511c9d522cfc 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -131,7 +131,6 @@ const struct bpf_func_proto
bpf_get_current_pid_tgid_proto = {>
   BPF_CALL_0(bpf_get_current_uid_gid)
   {

-   struct task_struct *task = current;

kuid_t uid;
kgid_t gid;


Needs to be squashed into patch 1/3?


I can squash it into 1/3, I kept it that way because
even without 1/3 this variable is unused.


Hmm, the helper looks like the below. In patch 1/3 you removed
the 'if (unlikely(!task))' test where the variable was used before,
so 2/3 without the 1/3 would result in a compile error.

BPF_CALL_0(bpf_get_current_uid_gid)
{
struct task_struct *task = current;
kuid_t uid;
kgid_t gid;

if (unlikely(!task))
return -EINVAL;

current_uid_gid(, );
return (u64) from_kgid(_user_ns, gid) << 32 |
 from_kuid(_user_ns, uid);
}


Re: [PATCH 2/3] bpf: Remove dead variable

2017-10-16 Thread Daniel Borkmann

On 10/16/2017 08:59 PM, Richard Weinberger wrote:

Am Montag, 16. Oktober 2017, 20:54:36 CEST schrieb Daniel Borkmann:

On 10/16/2017 08:18 PM, Richard Weinberger wrote:

task is never used in bpf_get_current_uid_gid(), kill it.

Signed-off-by: Richard Weinberger 
---

   kernel/bpf/helpers.c | 1 -
   1 file changed, 1 deletion(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index e8845adcd15e..511c9d522cfc 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -131,7 +131,6 @@ const struct bpf_func_proto
bpf_get_current_pid_tgid_proto = {>
   BPF_CALL_0(bpf_get_current_uid_gid)
   {

-   struct task_struct *task = current;

kuid_t uid;
kgid_t gid;


Needs to be squashed into patch 1/3?


I can squash it into 1/3, I kept it that way because
even without 1/3 this variable is unused.


Hmm, the helper looks like the below. In patch 1/3 you removed
the 'if (unlikely(!task))' test where the variable was used before,
so 2/3 without the 1/3 would result in a compile error.

BPF_CALL_0(bpf_get_current_uid_gid)
{
struct task_struct *task = current;
kuid_t uid;
kgid_t gid;

if (unlikely(!task))
return -EINVAL;

current_uid_gid(, );
return (u64) from_kgid(_user_ns, gid) << 32 |
 from_kuid(_user_ns, uid);
}


Re: [PATCH 2/3] bpf: Remove dead variable

2017-10-16 Thread Daniel Borkmann

On 10/16/2017 08:18 PM, Richard Weinberger wrote:

task is never used in bpf_get_current_uid_gid(), kill it.

Signed-off-by: Richard Weinberger 
---
  kernel/bpf/helpers.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index e8845adcd15e..511c9d522cfc 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -131,7 +131,6 @@ const struct bpf_func_proto bpf_get_current_pid_tgid_proto 
= {

  BPF_CALL_0(bpf_get_current_uid_gid)
  {
-   struct task_struct *task = current;
kuid_t uid;
kgid_t gid;




Needs to be squashed into patch 1/3?


Re: [PATCH 2/3] bpf: Remove dead variable

2017-10-16 Thread Daniel Borkmann

On 10/16/2017 08:18 PM, Richard Weinberger wrote:

task is never used in bpf_get_current_uid_gid(), kill it.

Signed-off-by: Richard Weinberger 
---
  kernel/bpf/helpers.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index e8845adcd15e..511c9d522cfc 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -131,7 +131,6 @@ const struct bpf_func_proto bpf_get_current_pid_tgid_proto 
= {

  BPF_CALL_0(bpf_get_current_uid_gid)
  {
-   struct task_struct *task = current;
kuid_t uid;
kgid_t gid;




Needs to be squashed into patch 1/3?


Re: [PATCH] bpf: devmap: Check attr->max_entries more carefully

2017-10-16 Thread Daniel Borkmann

[ +Tejun, Mark, John ]

On 10/16/2017 12:00 AM, Richard Weinberger wrote:

max_entries is user controlled and used as input for __alloc_percpu().
This function expects that the allocation size is a power of two and
less than PCPU_MIN_UNIT_SIZE.
Otherwise a WARN() is triggered.

Fixes: 11393cc9b9be ("xdp: Add batching support to redirect map")
Reported-by: Shankara Pailoor 
Reported-by: syzkaller 
Signed-off-by: Richard Weinberger 


Thanks for the patch, Richard. There was a prior discussion here [1] on
the same issue, I thought this would have been resolved by now, but looks
like it's still open and there was never a follow-up, at least I don't see
it in the percpu tree if I didn't miss anything.

I would suggest, we do the following below and pass __GFP_NOWARN from BPF
side to the per-cpu allocs. This is kind of a generic 'issue' and we shouldn't
add more code which bails out anyway just to work around the WARN(). Lets
handle it properly instead.

If Tejun is fine with the one below, I could cook and official patch and
cleanup the remaining call-sites from BPF which have similar pattern.

  [1] https://patchwork.kernel.org/patch/9975851/

Thanks,
Daniel

 mm/percpu.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index 59d44d6..5d9414e 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1357,7 +1357,8 @@ static void __percpu *pcpu_alloc(size_t size, size_t 
align, bool reserved,

if (unlikely(!size || size > PCPU_MIN_UNIT_SIZE || align > PAGE_SIZE ||
 !is_power_of_2(align))) {
-   WARN(true, "illegal size (%zu) or align (%zu) for percpu 
allocation\n",
+   WARN(!(gfp & __GFP_NOWARN),
+"illegal size (%zu) or align (%zu) for percpu 
allocation\n",
 size, align);
return NULL;
}
@@ -1478,7 +1479,7 @@ static void __percpu *pcpu_alloc(size_t size, size_t 
align, bool reserved,
 fail:
trace_percpu_alloc_percpu_fail(reserved, is_atomic, size, align);

-   if (!is_atomic && warn_limit) {
+   if (!is_atomic && warn_limit && !(gfp & __GFP_NOWARN)) {
pr_warn("allocation failed, size=%zu align=%zu atomic=%d, %s\n",
size, align, is_atomic, err);
dump_stack();
--
1.9.3


Re: [PATCH] bpf: devmap: Check attr->max_entries more carefully

2017-10-16 Thread Daniel Borkmann

[ +Tejun, Mark, John ]

On 10/16/2017 12:00 AM, Richard Weinberger wrote:

max_entries is user controlled and used as input for __alloc_percpu().
This function expects that the allocation size is a power of two and
less than PCPU_MIN_UNIT_SIZE.
Otherwise a WARN() is triggered.

Fixes: 11393cc9b9be ("xdp: Add batching support to redirect map")
Reported-by: Shankara Pailoor 
Reported-by: syzkaller 
Signed-off-by: Richard Weinberger 


Thanks for the patch, Richard. There was a prior discussion here [1] on
the same issue, I thought this would have been resolved by now, but looks
like it's still open and there was never a follow-up, at least I don't see
it in the percpu tree if I didn't miss anything.

I would suggest, we do the following below and pass __GFP_NOWARN from BPF
side to the per-cpu allocs. This is kind of a generic 'issue' and we shouldn't
add more code which bails out anyway just to work around the WARN(). Lets
handle it properly instead.

If Tejun is fine with the one below, I could cook and official patch and
cleanup the remaining call-sites from BPF which have similar pattern.

  [1] https://patchwork.kernel.org/patch/9975851/

Thanks,
Daniel

 mm/percpu.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index 59d44d6..5d9414e 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1357,7 +1357,8 @@ static void __percpu *pcpu_alloc(size_t size, size_t 
align, bool reserved,

if (unlikely(!size || size > PCPU_MIN_UNIT_SIZE || align > PAGE_SIZE ||
 !is_power_of_2(align))) {
-   WARN(true, "illegal size (%zu) or align (%zu) for percpu 
allocation\n",
+   WARN(!(gfp & __GFP_NOWARN),
+"illegal size (%zu) or align (%zu) for percpu 
allocation\n",
 size, align);
return NULL;
}
@@ -1478,7 +1479,7 @@ static void __percpu *pcpu_alloc(size_t size, size_t 
align, bool reserved,
 fail:
trace_percpu_alloc_percpu_fail(reserved, is_atomic, size, align);

-   if (!is_atomic && warn_limit) {
+   if (!is_atomic && warn_limit && !(gfp & __GFP_NOWARN)) {
pr_warn("allocation failed, size=%zu align=%zu atomic=%d, %s\n",
size, align, is_atomic, err);
dump_stack();
--
1.9.3


Re: [PATCH] Add -target to clang switch while cross compiling.

2017-10-14 Thread Daniel Borkmann

On 10/13/2017 09:24 PM, Abhijit Ayarekar wrote:

Update to llvm excludes assembly instructions.
llvm git revision is below

commit 65fad7c26569 ("bpf: add inline-asm support")

This change will be part of llvm  release 6.0

__ASM_SYSREG_H define is not required for native compile.
-target switch includes appropriate target specific files
while cross compiling

Tested on x86 and arm64.

Signed-off-by: Abhijit Ayarekar <abhijit.ayare...@caviumnetworks.com>


Acked-by: Daniel Borkmann <dan...@iogearbox.net>


Re: [PATCH] Add -target to clang switch while cross compiling.

2017-10-14 Thread Daniel Borkmann

On 10/13/2017 09:24 PM, Abhijit Ayarekar wrote:

Update to llvm excludes assembly instructions.
llvm git revision is below

commit 65fad7c26569 ("bpf: add inline-asm support")

This change will be part of llvm  release 6.0

__ASM_SYSREG_H define is not required for native compile.
-target switch includes appropriate target specific files
while cross compiling

Tested on x86 and arm64.

Signed-off-by: Abhijit Ayarekar 


Acked-by: Daniel Borkmann 


Re: [PATCH][bpf-next] bpf: remove redundant variable old_flags

2017-10-11 Thread Daniel Borkmann

On 10/11/2017 12:56 PM, Colin King wrote:

From: Colin Ian King <colin.k...@canonical.com>

Variable old_flags is being assigned but is never read; it is redundant
and can be removed.

Cleans up clang warning: Value stored to 'old_flags' is never read

Signed-off-by: Colin Ian King <colin.k...@canonical.com>


Acked-by: Daniel Borkmann <dan...@iogearbox.net>


Re: [PATCH][bpf-next] bpf: remove redundant variable old_flags

2017-10-11 Thread Daniel Borkmann

On 10/11/2017 12:56 PM, Colin King wrote:

From: Colin Ian King 

Variable old_flags is being assigned but is never read; it is redundant
and can be removed.

Cleans up clang warning: Value stored to 'old_flags' is never read

Signed-off-by: Colin Ian King 


Acked-by: Daniel Borkmann 


Re: [PATCH] once: switch to new jump label API

2017-10-09 Thread Daniel Borkmann

On 10/09/2017 10:27 PM, Eric Biggers wrote:

On Fri, Sep 15, 2017 at 09:07:51PM -0700, Eric Biggers wrote:

On Tue, Aug 22, 2017 at 02:44:41PM -0400, Hannes Frederic Sowa wrote:

Eric Biggers  writes:

From: Eric Biggers 

Switch the DO_ONCE() macro from the deprecated jump label API to the new
one.  The new one is more readable, and for DO_ONCE() it also makes the
generated code more icache-friendly: now the one-time initialization
code is placed out-of-line at the jump target, rather than at the inline
fallthrough case.

Signed-off-by: Eric Biggers 


Acked-by: Hannes Frederic Sowa 

Re: [PATCH] once: switch to new jump label API

2017-10-09 Thread Daniel Borkmann

On 10/09/2017 10:27 PM, Eric Biggers wrote:

On Fri, Sep 15, 2017 at 09:07:51PM -0700, Eric Biggers wrote:

On Tue, Aug 22, 2017 at 02:44:41PM -0400, Hannes Frederic Sowa wrote:

Eric Biggers  writes:

From: Eric Biggers 

Switch the DO_ONCE() macro from the deprecated jump label API to the new
one.  The new one is more readable, and for DO_ONCE() it also makes the
generated code more icache-friendly: now the one-time initialization
code is placed out-of-line at the jump target, rather than at the inline
fallthrough case.

Signed-off-by: Eric Biggers 


Acked-by: Hannes Frederic Sowa 

Great!  Who though is the maintainer for this code?  It seems it was originally
taken by David Miller through the networking tree.  David, are you taking
further patches to the "once" functions, or should I be trying to get this into
-mm, or somewhere else?

Eric


Ping.


Given original code was accepted against net-next tree as major users of
the api are networking related anyway, it should be fine here as well to
route through this tree. Maybe resend the patch with a [PATCH net-next]
in the subject line (as usually done) to make the targeted tree more clear.


Re: [PATCH 1/1] xdp: Sample xdp program implementing ip forward

2017-10-03 Thread Daniel Borkmann

On 10/03/2017 09:37 AM, cjacob wrote:

Implements port to port forwarding with route table and arp table
lookup for ipv4 packets using bpf_redirect helper function and
lpm_trie  map.

Signed-off-by: cjacob 


Thanks for the patch, just few minor comments below!

Note, should be full name, e.g.:

  Signed-off-by: Christina Jacob 

Also you From: only shows 'cjacob' as can be seen from the cover letter
as well, so perhaps check your git settings to make that full name:

  cjacob (1):
xdp: Sample xdp program implementing ip forward

If there's one single patch, then cover letter is not needed, only
for >1 sets.

[...]

+#define KBUILD_MODNAME "foo"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+#include 
+#include 
+
+struct trie_value {
+   __u8 prefix[4];
+   long value;
+   int gw;
+   int ifindex;
+   int metric;
+};
+
+union key_4 {
+   u32 b32[2];
+   u8 b8[8];
+};
+
+struct arp_entry {
+   int dst;
+   long mac;
+};
+
+struct direct_map {
+   long mac;
+   int ifindex;
+   struct arp_entry arp;
+};
+
+/* Map for trie implementation*/
+struct bpf_map_def SEC("maps") lpm_map = {
+   .type = BPF_MAP_TYPE_LPM_TRIE,
+   .key_size = 8,
+   .value_size =
+   sizeof(struct trie_value),


(Nit: there are couple of such breaks throughout the patch, can we
 just use single line for such cases where reasonable?)


+   .max_entries = 50,
+   .map_flags = BPF_F_NO_PREALLOC,
+};
+
+/* Map for counter*/
+struct bpf_map_def SEC("maps") rxcnt = {
+   .type = BPF_MAP_TYPE_PERCPU_ARRAY,
+   .key_size = sizeof(u32),
+   .value_size = sizeof(long),
+   .max_entries = 256,
+};
+
+/* Map for ARP table*/
+struct bpf_map_def SEC("maps") arp_table = {
+   .type = BPF_MAP_TYPE_HASH,
+   .key_size = sizeof(int),
+   .value_size = sizeof(long),


Perhaps these should be proper structs here, such that it
becomes easier to read/handle later on lookup.


+   .max_entries = 50,
+};
+
+/* Map to keep the exact match entries in the route table*/
+struct bpf_map_def SEC("maps") exact_match = {
+   .type = BPF_MAP_TYPE_HASH,
+   .key_size = sizeof(int),
+   .value_size = sizeof(struct direct_map),
+   .max_entries = 50,
+};
+
+/**
+ * Function to set source and destination mac of the packet
+ */
+static inline void set_src_dst_mac(void *data, void *src, void *dst)
+{
+   unsigned short *p  = data;
+   unsigned short *dest   = dst;
+   unsigned short *source = src;
+
+   p[3] = source[0];
+   p[4] = source[1];
+   p[5] = source[2];
+   p[0] = dest[0];
+   p[1] = dest[1];
+   p[2] = dest[2];


You could just use __builtin_memcpy() given length is
constant anyway, so LLVM will do the inlining.


+}
+
+/**
+ * Parse IPV4 packet to get SRC, DST IP and protocol
+ */
+static inline int parse_ipv4(void *data, u64 nh_off, void *data_end,
+unsigned int *src, unsigned int *dest)
+{
+   struct iphdr *iph = data + nh_off;
+
+   if (iph + 1 > data_end)
+   return 0;
+   *src = (unsigned int)iph->saddr;
+   *dest = (unsigned int)iph->daddr;


Why not stay with __be32 types?


+   return iph->protocol;
+}
+
+SEC("xdp3")
+int xdp_prog3(struct xdp_md *ctx)
+{
+   void *data_end = (void *)(long)ctx->data_end;
+   void *data = (void *)(long)ctx->data;
+   struct ethhdr *eth = data;
+   int rc = XDP_DROP, forward_to;
+   long *value;
+   struct trie_value *prefix_value;
+   long *dest_mac = NULL, *src_mac = NULL;
+   u16 h_proto;
+   u64 nh_off;
+   u32 ipproto;
+   union key_4 key4;
+
+   nh_off = sizeof(*eth);
+   if (data + nh_off > data_end)
+   return rc;
+
+   h_proto = eth->h_proto;
+
+   if (h_proto == htons(ETH_P_8021Q) || h_proto == htons(ETH_P_8021AD)) {
+   struct vlan_hdr *vhdr;
+
+   vhdr = data + nh_off;
+   nh_off += sizeof(struct vlan_hdr);
+   if (data + nh_off > data_end)
+   return rc;
+   h_proto = vhdr->h_vlan_encapsulated_proto;
+   }
+   if (h_proto == htons(ETH_P_ARP)) {
+   return XDP_PASS;
+   } else if (h_proto == htons(ETH_P_IP)) {
+   int src_ip = 0, dest_ip = 0;
+   struct direct_map *direct_entry;
+
+   ipproto = parse_ipv4(data, nh_off, data_end, _ip, _ip);
+   direct_entry = (struct direct_map *)bpf_map_lookup_elem
+   (_match, _ip);
+   /*check for exact match, this would give a faster lookup*/
+   if (direct_entry && direct_entry->mac &&
+   direct_entry->arp.mac) {
+   src_mac = _entry->mac;
+   dest_mac = _entry->arp.mac;
+   forward_to = 

Re: [PATCH 1/1] xdp: Sample xdp program implementing ip forward

2017-10-03 Thread Daniel Borkmann

On 10/03/2017 09:37 AM, cjacob wrote:

Implements port to port forwarding with route table and arp table
lookup for ipv4 packets using bpf_redirect helper function and
lpm_trie  map.

Signed-off-by: cjacob 


Thanks for the patch, just few minor comments below!

Note, should be full name, e.g.:

  Signed-off-by: Christina Jacob 

Also you From: only shows 'cjacob' as can be seen from the cover letter
as well, so perhaps check your git settings to make that full name:

  cjacob (1):
xdp: Sample xdp program implementing ip forward

If there's one single patch, then cover letter is not needed, only
for >1 sets.

[...]

+#define KBUILD_MODNAME "foo"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+#include 
+#include 
+
+struct trie_value {
+   __u8 prefix[4];
+   long value;
+   int gw;
+   int ifindex;
+   int metric;
+};
+
+union key_4 {
+   u32 b32[2];
+   u8 b8[8];
+};
+
+struct arp_entry {
+   int dst;
+   long mac;
+};
+
+struct direct_map {
+   long mac;
+   int ifindex;
+   struct arp_entry arp;
+};
+
+/* Map for trie implementation*/
+struct bpf_map_def SEC("maps") lpm_map = {
+   .type = BPF_MAP_TYPE_LPM_TRIE,
+   .key_size = 8,
+   .value_size =
+   sizeof(struct trie_value),


(Nit: there are couple of such breaks throughout the patch, can we
 just use single line for such cases where reasonable?)


+   .max_entries = 50,
+   .map_flags = BPF_F_NO_PREALLOC,
+};
+
+/* Map for counter*/
+struct bpf_map_def SEC("maps") rxcnt = {
+   .type = BPF_MAP_TYPE_PERCPU_ARRAY,
+   .key_size = sizeof(u32),
+   .value_size = sizeof(long),
+   .max_entries = 256,
+};
+
+/* Map for ARP table*/
+struct bpf_map_def SEC("maps") arp_table = {
+   .type = BPF_MAP_TYPE_HASH,
+   .key_size = sizeof(int),
+   .value_size = sizeof(long),


Perhaps these should be proper structs here, such that it
becomes easier to read/handle later on lookup.


+   .max_entries = 50,
+};
+
+/* Map to keep the exact match entries in the route table*/
+struct bpf_map_def SEC("maps") exact_match = {
+   .type = BPF_MAP_TYPE_HASH,
+   .key_size = sizeof(int),
+   .value_size = sizeof(struct direct_map),
+   .max_entries = 50,
+};
+
+/**
+ * Function to set source and destination mac of the packet
+ */
+static inline void set_src_dst_mac(void *data, void *src, void *dst)
+{
+   unsigned short *p  = data;
+   unsigned short *dest   = dst;
+   unsigned short *source = src;
+
+   p[3] = source[0];
+   p[4] = source[1];
+   p[5] = source[2];
+   p[0] = dest[0];
+   p[1] = dest[1];
+   p[2] = dest[2];


You could just use __builtin_memcpy() given length is
constant anyway, so LLVM will do the inlining.


+}
+
+/**
+ * Parse IPV4 packet to get SRC, DST IP and protocol
+ */
+static inline int parse_ipv4(void *data, u64 nh_off, void *data_end,
+unsigned int *src, unsigned int *dest)
+{
+   struct iphdr *iph = data + nh_off;
+
+   if (iph + 1 > data_end)
+   return 0;
+   *src = (unsigned int)iph->saddr;
+   *dest = (unsigned int)iph->daddr;


Why not stay with __be32 types?


+   return iph->protocol;
+}
+
+SEC("xdp3")
+int xdp_prog3(struct xdp_md *ctx)
+{
+   void *data_end = (void *)(long)ctx->data_end;
+   void *data = (void *)(long)ctx->data;
+   struct ethhdr *eth = data;
+   int rc = XDP_DROP, forward_to;
+   long *value;
+   struct trie_value *prefix_value;
+   long *dest_mac = NULL, *src_mac = NULL;
+   u16 h_proto;
+   u64 nh_off;
+   u32 ipproto;
+   union key_4 key4;
+
+   nh_off = sizeof(*eth);
+   if (data + nh_off > data_end)
+   return rc;
+
+   h_proto = eth->h_proto;
+
+   if (h_proto == htons(ETH_P_8021Q) || h_proto == htons(ETH_P_8021AD)) {
+   struct vlan_hdr *vhdr;
+
+   vhdr = data + nh_off;
+   nh_off += sizeof(struct vlan_hdr);
+   if (data + nh_off > data_end)
+   return rc;
+   h_proto = vhdr->h_vlan_encapsulated_proto;
+   }
+   if (h_proto == htons(ETH_P_ARP)) {
+   return XDP_PASS;
+   } else if (h_proto == htons(ETH_P_IP)) {
+   int src_ip = 0, dest_ip = 0;
+   struct direct_map *direct_entry;
+
+   ipproto = parse_ipv4(data, nh_off, data_end, _ip, _ip);
+   direct_entry = (struct direct_map *)bpf_map_lookup_elem
+   (_match, _ip);
+   /*check for exact match, this would give a faster lookup*/
+   if (direct_entry && direct_entry->mac &&
+   direct_entry->arp.mac) {
+   src_mac = _entry->mac;
+   dest_mac = _entry->arp.mac;
+   forward_to = direct_entry->ifindex;
+   } else {
+  

Re: EBPF-triggered WARNING at mm/percpu.c:1361 in v4-14-rc2

2017-09-28 Thread Daniel Borkmann

On 09/28/2017 04:45 PM, Mark Rutland wrote:

On Thu, Sep 28, 2017 at 04:37:46PM +0200, Daniel Borkmann wrote:

On 09/28/2017 01:27 PM, Mark Rutland wrote:

Hi,

While fuzzing v4.14-rc2 with Syzkaller, I found it was possible to trigger the
warning at mm/percpu.c:1361, on both arm64 and x86_64. This appears to require
increasing RLIMIT_MEMLOCK, so to the best of my knowledge this cannot be
triggered by an unprivileged user.

I've included example splats for both x86_64 and arm64, along with a C
reproducer, inline below.

It looks like dev_map_alloc() requests a percpu alloction of 32776 bytes, which
is larger than the maximum supported allocation size of 32768 bytes.

I wonder if it would make more sense to pr_warn() for sizes that are too
large, so that callers don't have to roll their own checks against
PCPU_MIN_UNIT_SIZE?


Perhaps the pr_warn() should be ratelimited; or could there be an
option where we only return NULL, not triggering a warn at all (which
would likely be what callers might do anyway when checking against
PCPU_MIN_UNIT_SIZE and then bailing out)?


Those both make sense to me; checking __GFP_NOWARN should be easy
enough.

Just to check, do you think that dev_map_alloc() should explicitly test
the size against PCPU_MIN_UNIT_SIZE, prior to calling pcpu_alloc()?


Looks like there are users of __alloc_percpu_gfp() with __GFP_NOWARN
in couple of places already, but __GFP_NOWARN is ignored. Would make
sense to support that indeed to avoid throwing the warn and just let
the caller bail out when it sees the NULL as usual. In some cases (like
the current ones) this makes sense, others probably not too much and
a WARN would be preferred way, but __alloc_percpu_gfp() could provide
such option to simplify some of the code that pre checks against the
limit on PCPU_MIN_UNIT_SIZE before calling the allocator and doesn't
throw a WARN either; and most likely such check is just to prevent
the user from seeing exactly this splat.

Thanks,
Daniel


Re: EBPF-triggered WARNING at mm/percpu.c:1361 in v4-14-rc2

2017-09-28 Thread Daniel Borkmann

On 09/28/2017 04:45 PM, Mark Rutland wrote:

On Thu, Sep 28, 2017 at 04:37:46PM +0200, Daniel Borkmann wrote:

On 09/28/2017 01:27 PM, Mark Rutland wrote:

Hi,

While fuzzing v4.14-rc2 with Syzkaller, I found it was possible to trigger the
warning at mm/percpu.c:1361, on both arm64 and x86_64. This appears to require
increasing RLIMIT_MEMLOCK, so to the best of my knowledge this cannot be
triggered by an unprivileged user.

I've included example splats for both x86_64 and arm64, along with a C
reproducer, inline below.

It looks like dev_map_alloc() requests a percpu alloction of 32776 bytes, which
is larger than the maximum supported allocation size of 32768 bytes.

I wonder if it would make more sense to pr_warn() for sizes that are too
large, so that callers don't have to roll their own checks against
PCPU_MIN_UNIT_SIZE?


Perhaps the pr_warn() should be ratelimited; or could there be an
option where we only return NULL, not triggering a warn at all (which
would likely be what callers might do anyway when checking against
PCPU_MIN_UNIT_SIZE and then bailing out)?


Those both make sense to me; checking __GFP_NOWARN should be easy
enough.

Just to check, do you think that dev_map_alloc() should explicitly test
the size against PCPU_MIN_UNIT_SIZE, prior to calling pcpu_alloc()?


Looks like there are users of __alloc_percpu_gfp() with __GFP_NOWARN
in couple of places already, but __GFP_NOWARN is ignored. Would make
sense to support that indeed to avoid throwing the warn and just let
the caller bail out when it sees the NULL as usual. In some cases (like
the current ones) this makes sense, others probably not too much and
a WARN would be preferred way, but __alloc_percpu_gfp() could provide
such option to simplify some of the code that pre checks against the
limit on PCPU_MIN_UNIT_SIZE before calling the allocator and doesn't
throw a WARN either; and most likely such check is just to prevent
the user from seeing exactly this splat.

Thanks,
Daniel


Re: EBPF-triggered WARNING at mm/percpu.c:1361 in v4-14-rc2

2017-09-28 Thread Daniel Borkmann

On 09/28/2017 01:27 PM, Mark Rutland wrote:

Hi,

While fuzzing v4.14-rc2 with Syzkaller, I found it was possible to trigger the
warning at mm/percpu.c:1361, on both arm64 and x86_64. This appears to require
increasing RLIMIT_MEMLOCK, so to the best of my knowledge this cannot be
triggered by an unprivileged user.

I've included example splats for both x86_64 and arm64, along with a C
reproducer, inline below.

It looks like dev_map_alloc() requests a percpu alloction of 32776 bytes, which
is larger than the maximum supported allocation size of 32768 bytes.

I wonder if it would make more sense to pr_warn() for sizes that are too
large, so that callers don't have to roll their own checks against
PCPU_MIN_UNIT_SIZE?


Perhaps the pr_warn() should be ratelimited; or could there be an
option where we only return NULL, not triggering a warn at all (which
would likely be what callers might do anyway when checking against
PCPU_MIN_UNIT_SIZE and then bailing out)?


e.g. something like:


diff --git a/mm/percpu.c b/mm/percpu.c
index 59d44d6..f731c45 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1355,8 +1355,13 @@ static void __percpu *pcpu_alloc(size_t size, size_t 
align, bool reserved,
 bits = size >> PCPU_MIN_ALLOC_SHIFT;
 bit_align = align >> PCPU_MIN_ALLOC_SHIFT;

-   if (unlikely(!size || size > PCPU_MIN_UNIT_SIZE || align > PAGE_SIZE ||
-!is_power_of_2(align))) {
+   if (unlikely(size > PCPU_MIN_UNIT_SIZE)) {
+   pr_warn("cannot allocate pcpu chunk of size %zu (max %zu)\n",
+   size, PCPU_MIN_UNIT_SIZE);
+   return NULL;
+   }
+
+   if (unlikely(!size || align > PAGE_SIZE || !is_power_of_2(align))) {
 WARN(true, "illegal size (%zu) or align (%zu) for percpu 
allocation\n",
  size, align);
 return NULL;


Thanks,
Mark.



Example splat(x86_64)

[  138.144185] illegal size (32776) or align (8) for percpu allocation
[  138.150452] [ cut here ]
[  138.155074] WARNING: CPU: 1 PID: 2223 at mm/percpu.c:1361 
pcpu_alloc+0x7c/0x5f0
[  138.162369] Modules linked in:
[  138.165423] CPU: 1 PID: 2223 Comm: repro Not tainted 4.14.0-rc2 #3
[  138.171593] Hardware name: LENOVO 7484A3G/LENOVO, BIOS 5CKT54AUS 09/07/2009
[  138.178543] task: 881b73069980 task.stack: a36f40f9
[  138.184455] RIP: 0010:pcpu_alloc+0x7c/0x5f0
[  138.188633] RSP: 0018:a36f40f93e00 EFLAGS: 00010286
[  138.193853] RAX: 0037 RBX:  RCX: 
[  138.200974] RDX: 881b7ec94a40 RSI: 881b7ec8cbb8 RDI: 881b7ec8cbb8
[  138.208097] RBP: a36f40f93e68 R08: 0001 R09: 02c4
[  138.215219] R10: 562a577047f0 R11: a10ad7cd R12: 881b73216cc0
[  138.222343] R13: 0014 R14: 7ffebeed0900 R15: ffea
[  138.229463] FS:  7fef84a15700() GS:881b7ec8() 
knlGS:
[  138.237538] CS:  0010 DS:  ES:  CR0: 80050033
[  138.243274] CR2: 7fef84497ba0 CR3: 0001b3235000 CR4: 000406e0
[  138.250397] Call Trace:
[  138.252844]  __alloc_percpu+0x10/0x20
[  138.256508]  dev_map_alloc+0x122/0x1b0
[  138.260255]  SyS_bpf+0x8f9/0x10b0
[  138.263570]  ? security_task_setrlimit+0x3e/0x60
[  138.268184]  ? do_prlimit+0xa6/0x1f0
[  138.271760]  entry_SYSCALL_64_fastpath+0x13/0x94
[  138.276372] RIP: 0033:0x7fef84546259
[  138.279946] RSP: 002b:7ffebeed09b8 EFLAGS: 0206 ORIG_RAX: 
0141
[  138.287503] RAX: ffda RBX:  RCX: 7fef84546259
[  138.294627] RDX: 0014 RSI: 7ffebeed09d0 RDI: 
[  138.301749] RBP: 562a57704780 R08: 7fef84810cb0 R09: 7ffebeed0ae8
[  138.308874] R10: 562a577047f0 R11: 0206 R12: 562a577045d0
[  138.315997] R13: 7ffebeed0ae0 R14:  R15: 
[  138.323122] Code: fe 00 10 00 00 77 10 48 8b 4d b8 48 89 c8 48 83 e8 01 48 85 c1 
74 1e 48 8b 55 b8 48 8b 75 c0 48 c7 c7 90 5e be a0 e8 40 88 f3 ff <0f> ff 45 31 
ed e9 5e 02 00 00 4c 8b 6d c0 49 89 cc 49 c1 ec 02
[  138.341953] ---[ end trace b6e380365bfb8a36 ]---




Example splat (arm64)

[   17.287365] illegal size (32776) or align (8) for percpu allocation
[   17.295347] [ cut here ]
[   17.297191] WARNING: CPU: 1 PID: 1440 at mm/percpu.c:1361 
pcpu_alloc+0x120/0x9f0
[   17.307723] Kernel panic - not syncing: panic_on_warn set ...
[   17.307723]
[   17.311755] CPU: 1 PID: 1440 Comm: repro Not tainted 
4.14.0-rc2-1-gd7ad33d #115
[   17.320675] Hardware name: linux,dummy-virt (DT)
[   17.323858] Call trace:
[   17.325246] [] dump_backtrace+0x0/0x558
[   17.332538] [] show_stack+0x20/0x30
[   17.340391] [] dump_stack+0x128/0x1a0
[   17.342081] [] panic+0x250/0x518
[   17.344096] [] __warn+0x2a4/0x310
[   17.345654] [] report_bug+0x1d4/0x290
[   17.348652] [] 

Re: EBPF-triggered WARNING at mm/percpu.c:1361 in v4-14-rc2

2017-09-28 Thread Daniel Borkmann

On 09/28/2017 01:27 PM, Mark Rutland wrote:

Hi,

While fuzzing v4.14-rc2 with Syzkaller, I found it was possible to trigger the
warning at mm/percpu.c:1361, on both arm64 and x86_64. This appears to require
increasing RLIMIT_MEMLOCK, so to the best of my knowledge this cannot be
triggered by an unprivileged user.

I've included example splats for both x86_64 and arm64, along with a C
reproducer, inline below.

It looks like dev_map_alloc() requests a percpu alloction of 32776 bytes, which
is larger than the maximum supported allocation size of 32768 bytes.

I wonder if it would make more sense to pr_warn() for sizes that are too
large, so that callers don't have to roll their own checks against
PCPU_MIN_UNIT_SIZE?


Perhaps the pr_warn() should be ratelimited; or could there be an
option where we only return NULL, not triggering a warn at all (which
would likely be what callers might do anyway when checking against
PCPU_MIN_UNIT_SIZE and then bailing out)?


e.g. something like:


diff --git a/mm/percpu.c b/mm/percpu.c
index 59d44d6..f731c45 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1355,8 +1355,13 @@ static void __percpu *pcpu_alloc(size_t size, size_t 
align, bool reserved,
 bits = size >> PCPU_MIN_ALLOC_SHIFT;
 bit_align = align >> PCPU_MIN_ALLOC_SHIFT;

-   if (unlikely(!size || size > PCPU_MIN_UNIT_SIZE || align > PAGE_SIZE ||
-!is_power_of_2(align))) {
+   if (unlikely(size > PCPU_MIN_UNIT_SIZE)) {
+   pr_warn("cannot allocate pcpu chunk of size %zu (max %zu)\n",
+   size, PCPU_MIN_UNIT_SIZE);
+   return NULL;
+   }
+
+   if (unlikely(!size || align > PAGE_SIZE || !is_power_of_2(align))) {
 WARN(true, "illegal size (%zu) or align (%zu) for percpu 
allocation\n",
  size, align);
 return NULL;


Thanks,
Mark.



Example splat(x86_64)

[  138.144185] illegal size (32776) or align (8) for percpu allocation
[  138.150452] [ cut here ]
[  138.155074] WARNING: CPU: 1 PID: 2223 at mm/percpu.c:1361 
pcpu_alloc+0x7c/0x5f0
[  138.162369] Modules linked in:
[  138.165423] CPU: 1 PID: 2223 Comm: repro Not tainted 4.14.0-rc2 #3
[  138.171593] Hardware name: LENOVO 7484A3G/LENOVO, BIOS 5CKT54AUS 09/07/2009
[  138.178543] task: 881b73069980 task.stack: a36f40f9
[  138.184455] RIP: 0010:pcpu_alloc+0x7c/0x5f0
[  138.188633] RSP: 0018:a36f40f93e00 EFLAGS: 00010286
[  138.193853] RAX: 0037 RBX:  RCX: 
[  138.200974] RDX: 881b7ec94a40 RSI: 881b7ec8cbb8 RDI: 881b7ec8cbb8
[  138.208097] RBP: a36f40f93e68 R08: 0001 R09: 02c4
[  138.215219] R10: 562a577047f0 R11: a10ad7cd R12: 881b73216cc0
[  138.222343] R13: 0014 R14: 7ffebeed0900 R15: ffea
[  138.229463] FS:  7fef84a15700() GS:881b7ec8() 
knlGS:
[  138.237538] CS:  0010 DS:  ES:  CR0: 80050033
[  138.243274] CR2: 7fef84497ba0 CR3: 0001b3235000 CR4: 000406e0
[  138.250397] Call Trace:
[  138.252844]  __alloc_percpu+0x10/0x20
[  138.256508]  dev_map_alloc+0x122/0x1b0
[  138.260255]  SyS_bpf+0x8f9/0x10b0
[  138.263570]  ? security_task_setrlimit+0x3e/0x60
[  138.268184]  ? do_prlimit+0xa6/0x1f0
[  138.271760]  entry_SYSCALL_64_fastpath+0x13/0x94
[  138.276372] RIP: 0033:0x7fef84546259
[  138.279946] RSP: 002b:7ffebeed09b8 EFLAGS: 0206 ORIG_RAX: 
0141
[  138.287503] RAX: ffda RBX:  RCX: 7fef84546259
[  138.294627] RDX: 0014 RSI: 7ffebeed09d0 RDI: 
[  138.301749] RBP: 562a57704780 R08: 7fef84810cb0 R09: 7ffebeed0ae8
[  138.308874] R10: 562a577047f0 R11: 0206 R12: 562a577045d0
[  138.315997] R13: 7ffebeed0ae0 R14:  R15: 
[  138.323122] Code: fe 00 10 00 00 77 10 48 8b 4d b8 48 89 c8 48 83 e8 01 48 85 c1 
74 1e 48 8b 55 b8 48 8b 75 c0 48 c7 c7 90 5e be a0 e8 40 88 f3 ff <0f> ff 45 31 
ed e9 5e 02 00 00 4c 8b 6d c0 49 89 cc 49 c1 ec 02
[  138.341953] ---[ end trace b6e380365bfb8a36 ]---




Example splat (arm64)

[   17.287365] illegal size (32776) or align (8) for percpu allocation
[   17.295347] [ cut here ]
[   17.297191] WARNING: CPU: 1 PID: 1440 at mm/percpu.c:1361 
pcpu_alloc+0x120/0x9f0
[   17.307723] Kernel panic - not syncing: panic_on_warn set ...
[   17.307723]
[   17.311755] CPU: 1 PID: 1440 Comm: repro Not tainted 
4.14.0-rc2-1-gd7ad33d #115
[   17.320675] Hardware name: linux,dummy-virt (DT)
[   17.323858] Call trace:
[   17.325246] [] dump_backtrace+0x0/0x558
[   17.332538] [] show_stack+0x20/0x30
[   17.340391] [] dump_stack+0x128/0x1a0
[   17.342081] [] panic+0x250/0x518
[   17.344096] [] __warn+0x2a4/0x310
[   17.345654] [] report_bug+0x1d4/0x290
[   17.348652] [] 

Re: WARNING: kernel stack frame pointer at ffff880156a5fea0 in bash:2103 has bad value 00007ffec7d87e50

2017-09-26 Thread Daniel Borkmann

On 09/26/2017 11:51 PM, Richard Weinberger wrote:

Alexei,

CC'ing Josh and Ingo.

Am Dienstag, 26. September 2017, 06:09:02 CEST schrieb Alexei Starovoitov:

On Mon, Sep 25, 2017 at 11:23:31PM +0200, Richard Weinberger wrote:

Hi!

While playing with bcc's opensnoop tool on Linux 4.14-rc2 I managed to
trigger this splat:

[  297.629773] WARNING: kernel stack frame pointer at 880156a5fea0 in
bash:2103 has bad value 7ffec7d87e50
[  297.629777] unwind stack type:0 next_sp:  (null) mask:0x6
graph_idx:0
[  297.629783] 88015b207ae0: 88015b207b68 (0x88015b207b68)
[  297.629790] 88015b207ae8: b163c00e
(__save_stack_trace+0x6e/
0xd0)
[  297.629792] 88015b207af0:  ...
[  297.629795] 88015b207af8: 880156a58000 (0x880156a58000)
[  297.629799] 88015b207b00: 880156a6 (0x880156a6)
[  297.629800] 88015b207b08:  ...
[  297.629803] 88015b207b10: 0006 (0x6)
[  297.629806] 88015b207b18: 880151b02700 (0x880151b02700)
[  297.629809] 88015b207b20: 0101 (0x101)
[  297.629812] 88015b207b28: 880156a5fea0 (0x880156a5fea0)
[  297.629815] 88015b207b30: 88015b207ae0 (0x88015b207ae0)
[  297.629818] 88015b207b38: c0050282 (0xc0050282)
[  297.629819] 88015b207b40:  ...
[  297.629822] 88015b207b48: 0100 (0x100)
[  297.629825] 88015b207b50: 880157b98280 (0x880157b98280)
[  297.629828] 88015b207b58: 880157b98380 (0x880157b98380)
[  297.629831] 88015b207b60: 88015ad2b500 (0x88015ad2b500)
[  297.629834] 88015b207b68: 88015b207b78 (0x88015b207b78)
[  297.629838] 88015b207b70: b163c086
(save_stack_trace+0x16/0x20) [  297.629841] 88015b207b78:
88015b207da8 (0x88015b207da8) [  297.629847] 88015b207b80:
b18a8ed6 (save_stack+0x46/0xd0) [  297.629850] 88015b207b88:
004c (0x4c)
[  297.629852] 88015b207b90: 88015b207ba0 (0x88015b207ba0)
[  297.629855] 88015b207b98: 8801 (0x8801)
[  297.629859] 88015b207ba0: b163c086
(save_stack_trace+0x16/0x20) [  297.629864] 88015b207ba8:
b18a8ed6 (save_stack+0x46/0xd0) [  297.629868] 88015b207bb0:
b18a9752 (kasan_slab_free+0x72/0xc0)

Thanks for the report!
I'm not sure I understand what's going on here.
It seems you have kasan enabled and it's trying to do save_stack()
and something crashing?
I don't see any bpf related helpers in the stack trace.
Which architecture is this? and .config ?
Is bpf jit enabled? If so, make sure that net.core.bpf_jit_kallsyms=1


I found some time to dig a little further.
It seems to happen only when CONFIG_DEBUG_SPINLOCK is enabled, please see the
attached .config. The JIT is off.
KAsan is also not involved at all, the regular stack saving machinery from the
trace framework initiates the stack unwinder.

The issue arises as soon as in pre_handler_kretprobe() raw_spin_lock_irqsave()
is being called.
It happens on all releases that have commit c32c47c68a0a ("x86/unwind: Warn on
bad frame pointer").
Interestingly it does not happen when I run
samples/kprobes/kretprobe_example.ko. So, BPF must be involved somehow.


Some time ago, Josh fixed this one here, seems perhaps related in
some way; it was triggerable back then from one of the BPF tracing
samples if I recall correctly:

commit a8b7a92318b6d7779f6d8e9aa6ba0e3de01a8943
Author: Josh Poimboeuf <jpoim...@redhat.com>
Date:   Wed Apr 12 13:47:12 2017 -0500

x86/unwind: Silence entry-related warnings

A few people have reported unwinder warnings like the following:

  WARNING: kernel stack frame pointer at c9fe7ff0 in rsync:1157 has 
bad value   (null)
  unwind stack type:0 next_sp:  (null) mask:2 graph_idx:0
  c9fe7f98: c9fe7ff0 (0xc9fe7ff0)
  c9fe7fa0: b7000f56 (trace_hardirqs_off_thunk+0x1a/0x1c)
  c9fe7fa8: 0246 (0x246)
  c9fe7fb0:  ...
  c9fe7fc0: 7ffe3af639bc (0x7ffe3af639bc)
  c9fe7fc8: 0006 (0x6)
  c9fe7fd0: 7f80af433fc5 (0x7f80af433fc5)
  c9fe7fd8: 7ffe3af638e0 (0x7ffe3af638e0)
  c9fe7fe0: 7ffe3af638e0 (0x7ffe3af638e0)
  c9fe7fe8: 7ffe3af63970 (0x7ffe3af63970)
  c9fe7ff0:  ...
  c9fe7ff8: b7b74b9a 
(entry_SYSCALL_64_after_swapgs+0x17/0x4f)

This warning can happen when unwinding a code path where an interrupt
occurred in x86 entry code before it set up the first stack frame.
Silently ignore any warnings for this case.

    Reported-by: Daniel Borkmann <dan...@iogearbox.net>
Reported-by: Dave Jones <da...@codemonkey.org.uk>
Signed-off-by: Josh Poimboeuf <jpoim...@r

Re: WARNING: kernel stack frame pointer at ffff880156a5fea0 in bash:2103 has bad value 00007ffec7d87e50

2017-09-26 Thread Daniel Borkmann

On 09/26/2017 11:51 PM, Richard Weinberger wrote:

Alexei,

CC'ing Josh and Ingo.

Am Dienstag, 26. September 2017, 06:09:02 CEST schrieb Alexei Starovoitov:

On Mon, Sep 25, 2017 at 11:23:31PM +0200, Richard Weinberger wrote:

Hi!

While playing with bcc's opensnoop tool on Linux 4.14-rc2 I managed to
trigger this splat:

[  297.629773] WARNING: kernel stack frame pointer at 880156a5fea0 in
bash:2103 has bad value 7ffec7d87e50
[  297.629777] unwind stack type:0 next_sp:  (null) mask:0x6
graph_idx:0
[  297.629783] 88015b207ae0: 88015b207b68 (0x88015b207b68)
[  297.629790] 88015b207ae8: b163c00e
(__save_stack_trace+0x6e/
0xd0)
[  297.629792] 88015b207af0:  ...
[  297.629795] 88015b207af8: 880156a58000 (0x880156a58000)
[  297.629799] 88015b207b00: 880156a6 (0x880156a6)
[  297.629800] 88015b207b08:  ...
[  297.629803] 88015b207b10: 0006 (0x6)
[  297.629806] 88015b207b18: 880151b02700 (0x880151b02700)
[  297.629809] 88015b207b20: 0101 (0x101)
[  297.629812] 88015b207b28: 880156a5fea0 (0x880156a5fea0)
[  297.629815] 88015b207b30: 88015b207ae0 (0x88015b207ae0)
[  297.629818] 88015b207b38: c0050282 (0xc0050282)
[  297.629819] 88015b207b40:  ...
[  297.629822] 88015b207b48: 0100 (0x100)
[  297.629825] 88015b207b50: 880157b98280 (0x880157b98280)
[  297.629828] 88015b207b58: 880157b98380 (0x880157b98380)
[  297.629831] 88015b207b60: 88015ad2b500 (0x88015ad2b500)
[  297.629834] 88015b207b68: 88015b207b78 (0x88015b207b78)
[  297.629838] 88015b207b70: b163c086
(save_stack_trace+0x16/0x20) [  297.629841] 88015b207b78:
88015b207da8 (0x88015b207da8) [  297.629847] 88015b207b80:
b18a8ed6 (save_stack+0x46/0xd0) [  297.629850] 88015b207b88:
004c (0x4c)
[  297.629852] 88015b207b90: 88015b207ba0 (0x88015b207ba0)
[  297.629855] 88015b207b98: 8801 (0x8801)
[  297.629859] 88015b207ba0: b163c086
(save_stack_trace+0x16/0x20) [  297.629864] 88015b207ba8:
b18a8ed6 (save_stack+0x46/0xd0) [  297.629868] 88015b207bb0:
b18a9752 (kasan_slab_free+0x72/0xc0)

Thanks for the report!
I'm not sure I understand what's going on here.
It seems you have kasan enabled and it's trying to do save_stack()
and something crashing?
I don't see any bpf related helpers in the stack trace.
Which architecture is this? and .config ?
Is bpf jit enabled? If so, make sure that net.core.bpf_jit_kallsyms=1


I found some time to dig a little further.
It seems to happen only when CONFIG_DEBUG_SPINLOCK is enabled, please see the
attached .config. The JIT is off.
KAsan is also not involved at all, the regular stack saving machinery from the
trace framework initiates the stack unwinder.

The issue arises as soon as in pre_handler_kretprobe() raw_spin_lock_irqsave()
is being called.
It happens on all releases that have commit c32c47c68a0a ("x86/unwind: Warn on
bad frame pointer").
Interestingly it does not happen when I run
samples/kprobes/kretprobe_example.ko. So, BPF must be involved somehow.


Some time ago, Josh fixed this one here, seems perhaps related in
some way; it was triggerable back then from one of the BPF tracing
samples if I recall correctly:

commit a8b7a92318b6d7779f6d8e9aa6ba0e3de01a8943
Author: Josh Poimboeuf 
Date:   Wed Apr 12 13:47:12 2017 -0500

x86/unwind: Silence entry-related warnings

A few people have reported unwinder warnings like the following:

  WARNING: kernel stack frame pointer at c9fe7ff0 in rsync:1157 has 
bad value   (null)
  unwind stack type:0 next_sp:  (null) mask:2 graph_idx:0
  c9fe7f98: c9fe7ff0 (0xc9fe7ff0)
  c9fe7fa0: b7000f56 (trace_hardirqs_off_thunk+0x1a/0x1c)
  c9fe7fa8: 0246 (0x246)
  c9fe7fb0:  ...
  c9fe7fc0: 7ffe3af639bc (0x7ffe3af639bc)
  c9fe7fc8: 0006 (0x6)
  c9fe7fd0: 7f80af433fc5 (0x7f80af433fc5)
  c9fe7fd8: 7ffe3af638e0 (0x7ffe3af638e0)
  c9fe7fe0: 7ffe3af638e0 (0x7ffe3af638e0)
  c9fe7fe8: 7ffe3af63970 (0x7ffe3af63970)
  c9fe7ff0:  ...
  c9fe7ff8: b7b74b9a 
(entry_SYSCALL_64_after_swapgs+0x17/0x4f)

This warning can happen when unwinding a code path where an interrupt
occurred in x86 entry code before it set up the first stack frame.
Silently ignore any warnings for this case.

Reported-by: Daniel Borkmann 
Reported-by: Dave Jones 
Signed-off-by: Josh Poimboeuf 
Acked-by: Thomas Gleixner 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Ger

Re: [PATCH v4 4/4] samples/bpf: Add documentation on cross compilation

2017-09-20 Thread Daniel Borkmann

On 09/20/2017 06:11 PM, Joel Fernandes wrote:

Acked-by: Alexei Starovoitov <a...@kernel.org>
Signed-off-by: Joel Fernandes <joe...@google.com>


(Minor typo pointed out by Randy, but rest looks fine.)

Acked-by: Daniel Borkmann <dan...@iogearbox.net>


Re: [PATCH v4 4/4] samples/bpf: Add documentation on cross compilation

2017-09-20 Thread Daniel Borkmann

On 09/20/2017 06:11 PM, Joel Fernandes wrote:

Acked-by: Alexei Starovoitov 
Signed-off-by: Joel Fernandes 


(Minor typo pointed out by Randy, but rest looks fine.)

Acked-by: Daniel Borkmann 


Re: [PATCH v4 3/4] samples/bpf: Fix pt_regs issues when cross-compiling

2017-09-20 Thread Daniel Borkmann

On 09/20/2017 06:11 PM, Joel Fernandes wrote:

BPF samples fail to build when cross-compiling for ARM64 because of incorrect
pt_regs param selection. This is because clang defines __x86_64__ and
bpf_headers thinks we're building for x86. Since clang is building for the BPF
target, it shouldn't make assumptions about what target the BPF program is
going to run on. To fix this, lets pass ARCH so the header knows which target
the BPF program is being compiled for and can use the correct pt_regs code.

Acked-by: Alexei Starovoitov <a...@kernel.org>
Signed-off-by: Joel Fernandes <joe...@google.com>


Acked-by: Daniel Borkmann <dan...@iogearbox.net>


Re: [PATCH v4 3/4] samples/bpf: Fix pt_regs issues when cross-compiling

2017-09-20 Thread Daniel Borkmann

On 09/20/2017 06:11 PM, Joel Fernandes wrote:

BPF samples fail to build when cross-compiling for ARM64 because of incorrect
pt_regs param selection. This is because clang defines __x86_64__ and
bpf_headers thinks we're building for x86. Since clang is building for the BPF
target, it shouldn't make assumptions about what target the BPF program is
going to run on. To fix this, lets pass ARCH so the header knows which target
the BPF program is being compiled for and can use the correct pt_regs code.

Acked-by: Alexei Starovoitov 
Signed-off-by: Joel Fernandes 


Acked-by: Daniel Borkmann 


Re: [PATCH v4 2/4] samples/bpf: Enable cross compiler support

2017-09-20 Thread Daniel Borkmann

On 09/20/2017 06:11 PM, Joel Fernandes wrote:

When cross compiling, bpf samples use HOSTCC for compiling the non-BPF part of
the sample, however what we really want is to use the cross compiler to build
for the cross target since that is what will load and run the BPF sample.
Detect this and compile samples correctly.

Acked-by: Alexei Starovoitov <a...@kernel.org>
Signed-off-by: Joel Fernandes <joe...@google.com>


Acked-by: Daniel Borkmann <dan...@iogearbox.net>


Re: [PATCH v4 2/4] samples/bpf: Enable cross compiler support

2017-09-20 Thread Daniel Borkmann

On 09/20/2017 06:11 PM, Joel Fernandes wrote:

When cross compiling, bpf samples use HOSTCC for compiling the non-BPF part of
the sample, however what we really want is to use the cross compiler to build
for the cross target since that is what will load and run the BPF sample.
Detect this and compile samples correctly.

Acked-by: Alexei Starovoitov 
Signed-off-by: Joel Fernandes 


Acked-by: Daniel Borkmann 


Re: [PATCH v4 1/4] samples/bpf: Use getppid instead of getpgrp for array map stress

2017-09-20 Thread Daniel Borkmann

On 09/20/2017 06:11 PM, Joel Fernandes wrote:

When cross-compiling the bpf sample map_perf_test for aarch64, I find that
__NR_getpgrp is undefined. This causes build errors. This syscall is deprecated
and requires defining __ARCH_WANT_SYSCALL_DEPRECATED. To avoid having to define
that, just use a different syscall (getppid) for the array map stress test.

Acked-by: Alexei Starovoitov <a...@kernel.org>
Signed-off-by: Joel Fernandes <joe...@google.com>


Acked-by: Daniel Borkmann <dan...@iogearbox.net>


Re: [PATCH v4 1/4] samples/bpf: Use getppid instead of getpgrp for array map stress

2017-09-20 Thread Daniel Borkmann

On 09/20/2017 06:11 PM, Joel Fernandes wrote:

When cross-compiling the bpf sample map_perf_test for aarch64, I find that
__NR_getpgrp is undefined. This causes build errors. This syscall is deprecated
and requires defining __ARCH_WANT_SYSCALL_DEPRECATED. To avoid having to define
that, just use a different syscall (getppid) for the array map stress test.

Acked-by: Alexei Starovoitov 
Signed-off-by: Joel Fernandes 


Acked-by: Daniel Borkmann 


Re: selftests/bpf doesn't compile

2017-09-18 Thread Daniel Borkmann

On 09/16/2017 12:41 AM, Shuah Khan wrote:

On 09/15/2017 12:48 PM, Daniel Borkmann wrote:

On 09/15/2017 08:23 PM, Daniel Borkmann wrote:

On 09/15/2017 08:07 PM, Alexei Starovoitov wrote:

On Fri, Sep 15, 2017 at 05:58:40PM +0100, Edward Cree wrote:

On 15/09/17 17:02, Alexei Starovoitov wrote:

On Thu, Sep 14, 2017 at 09:33:48AM -0600, Shuah Khan wrote:

Is bpf test intended to be run in kselftest run? The clang dependency might
not be met on majority of the systems. Is this a hard dependency??

It is a hard dependency and clang should be present on majority of the systems.

I think this is the wrong approach.  Making kselftest hard-require clang doesn't
   mean that the bpf tests will be run more often, it means that the rest of the
   kselftests will be run less often.  clang is quite big (when I tried to 
install
   it on one of my test servers, I didn't have enough disk space & had to go on 
a
   clear-out of unused packages), and most people aren't interested in the bpf
   subsystem specifically; they would rather be able to skip those tests.
I feel that as long as they know they are skipping some tests (so e.g. they
   won't consider it a sufficient test of a kselftest refactor), that's fine.
It's not even as though all of the bpf tests require clang; the (smaller) tests
   written directly in raw eBPF instructions could still be run on such a 
system.
   So I think we should attempt to run as much as possible but accept that clang
   may not be available and have an option to skip some tests in that case.


imo the value of selftests/bpf is twofold:
1. it helps bpf developers avoid regressions
2. as part of continuous integration it helps to catch bpf regressions
that were somehow caused by changes in other parts of the kernel

If a developer didn't bother to satisfy all bpf tests dependencies
(which includes clang) and ran all tests before sending a patch,
I don't want to see such patches. It just wastes maintainers time
to review code and spot bugs that could have been caught by tests.
Collectively we invested years of work into these tests and
developers better take advantage of it by running all.


+1


If a CI server didn't satisfy all bpf test dependencies,
I don't want such CI setup to be running and reporting results,
since it will give false sense of test coverage.
Test failures due to missing dependencies are hard failures.
We cannot skip them.


+1


Btw, on that note, the folks from zero-day bot run the BPF kselftests
for a while now just fine and they do run them together with clang,
so they have the full, proper coverage how it should be. It's not
how it used to be in the early days, you can just go and install
llvm/clang package on all the major distros today and you get the
bpf target by default enabled already.


I'd like generic XDP tests to be added to selftests/bpf which
would mean that the latest iproute2 will become a hard dependency
and bpf developers and CI host owners would need to upgrade
their iproute2.
The tests either pass or fail. Skipping them due to missing
dependencies is the same as fail and in that sense I don't want
to change selftests/bpf/Makefile to make it skip clang.


I fully agree that for the BPF selftests it is very desirable
to not only test the verifier with couple of BPF insn snippets,
but to actually load and run programs that more closely resemble
real world programs. For more complex interactions these snippets
are just limited, think of tail calls, testing perf event output
helper, etc, which would all require to write these tests with
restricted C when we add them (unless we want to make writing
these tests a real pain ;) in which case no-one will bother to
write tests at all for them). Mid to long term I would definitely
like to see more programs in BPF selftests (e.g. moved over from
samples/bpf/) to increase the test coverage.


As I said in my earlier email:

Unless users choose to install clang, bpf will always fail run without
clang. So clang dependency is an issue for bpf test coverage in general.
That is your choice as to whether you want to increase the scope of
regression test coverage for bpf or not.

I fully understand you have weigh that against ease of writing tests.

We can leave things the way they are since:

- You can't force users to install clang and run bpf test. Users might
   opt for letting bpf test fail due to unmet dependency.

- You have reasons to continue use clang and you have been using it for
   a longtime.


I'm definitely for leaving it as it currently is and having clang as
hard dependency; there will be more BPF selftests over time that will
require to compile BPF progs (through clang's BPF backend) written in
restricted C, so just skipping these tests would give a false sense
of coverage. clang is pretty much needed anyway for writing more complex
programs, thus leaving requirements the way they are is the much better
option.


I will try to see why make ksefltest fails on bpf even with my patch
se

Re: selftests/bpf doesn't compile

2017-09-18 Thread Daniel Borkmann

On 09/16/2017 12:41 AM, Shuah Khan wrote:

On 09/15/2017 12:48 PM, Daniel Borkmann wrote:

On 09/15/2017 08:23 PM, Daniel Borkmann wrote:

On 09/15/2017 08:07 PM, Alexei Starovoitov wrote:

On Fri, Sep 15, 2017 at 05:58:40PM +0100, Edward Cree wrote:

On 15/09/17 17:02, Alexei Starovoitov wrote:

On Thu, Sep 14, 2017 at 09:33:48AM -0600, Shuah Khan wrote:

Is bpf test intended to be run in kselftest run? The clang dependency might
not be met on majority of the systems. Is this a hard dependency??

It is a hard dependency and clang should be present on majority of the systems.

I think this is the wrong approach.  Making kselftest hard-require clang doesn't
   mean that the bpf tests will be run more often, it means that the rest of the
   kselftests will be run less often.  clang is quite big (when I tried to 
install
   it on one of my test servers, I didn't have enough disk space & had to go on 
a
   clear-out of unused packages), and most people aren't interested in the bpf
   subsystem specifically; they would rather be able to skip those tests.
I feel that as long as they know they are skipping some tests (so e.g. they
   won't consider it a sufficient test of a kselftest refactor), that's fine.
It's not even as though all of the bpf tests require clang; the (smaller) tests
   written directly in raw eBPF instructions could still be run on such a 
system.
   So I think we should attempt to run as much as possible but accept that clang
   may not be available and have an option to skip some tests in that case.


imo the value of selftests/bpf is twofold:
1. it helps bpf developers avoid regressions
2. as part of continuous integration it helps to catch bpf regressions
that were somehow caused by changes in other parts of the kernel

If a developer didn't bother to satisfy all bpf tests dependencies
(which includes clang) and ran all tests before sending a patch,
I don't want to see such patches. It just wastes maintainers time
to review code and spot bugs that could have been caught by tests.
Collectively we invested years of work into these tests and
developers better take advantage of it by running all.


+1


If a CI server didn't satisfy all bpf test dependencies,
I don't want such CI setup to be running and reporting results,
since it will give false sense of test coverage.
Test failures due to missing dependencies are hard failures.
We cannot skip them.


+1


Btw, on that note, the folks from zero-day bot run the BPF kselftests
for a while now just fine and they do run them together with clang,
so they have the full, proper coverage how it should be. It's not
how it used to be in the early days, you can just go and install
llvm/clang package on all the major distros today and you get the
bpf target by default enabled already.


I'd like generic XDP tests to be added to selftests/bpf which
would mean that the latest iproute2 will become a hard dependency
and bpf developers and CI host owners would need to upgrade
their iproute2.
The tests either pass or fail. Skipping them due to missing
dependencies is the same as fail and in that sense I don't want
to change selftests/bpf/Makefile to make it skip clang.


I fully agree that for the BPF selftests it is very desirable
to not only test the verifier with couple of BPF insn snippets,
but to actually load and run programs that more closely resemble
real world programs. For more complex interactions these snippets
are just limited, think of tail calls, testing perf event output
helper, etc, which would all require to write these tests with
restricted C when we add them (unless we want to make writing
these tests a real pain ;) in which case no-one will bother to
write tests at all for them). Mid to long term I would definitely
like to see more programs in BPF selftests (e.g. moved over from
samples/bpf/) to increase the test coverage.


As I said in my earlier email:

Unless users choose to install clang, bpf will always fail run without
clang. So clang dependency is an issue for bpf test coverage in general.
That is your choice as to whether you want to increase the scope of
regression test coverage for bpf or not.

I fully understand you have weigh that against ease of writing tests.

We can leave things the way they are since:

- You can't force users to install clang and run bpf test. Users might
   opt for letting bpf test fail due to unmet dependency.

- You have reasons to continue use clang and you have been using it for
   a longtime.


I'm definitely for leaving it as it currently is and having clang as
hard dependency; there will be more BPF selftests over time that will
require to compile BPF progs (through clang's BPF backend) written in
restricted C, so just skipping these tests would give a false sense
of coverage. clang is pretty much needed anyway for writing more complex
programs, thus leaving requirements the way they are is the much better
option.


I will try to see why make ksefltest fails on bpf even with my patch
se

Re: selftests/bpf doesn't compile

2017-09-15 Thread Daniel Borkmann

On 09/15/2017 08:23 PM, Daniel Borkmann wrote:

On 09/15/2017 08:07 PM, Alexei Starovoitov wrote:

On Fri, Sep 15, 2017 at 05:58:40PM +0100, Edward Cree wrote:

On 15/09/17 17:02, Alexei Starovoitov wrote:

On Thu, Sep 14, 2017 at 09:33:48AM -0600, Shuah Khan wrote:

Is bpf test intended to be run in kselftest run? The clang dependency might
not be met on majority of the systems. Is this a hard dependency??

It is a hard dependency and clang should be present on majority of the systems.

I think this is the wrong approach.  Making kselftest hard-require clang doesn't
  mean that the bpf tests will be run more often, it means that the rest of the
  kselftests will be run less often.  clang is quite big (when I tried to 
install
  it on one of my test servers, I didn't have enough disk space & had to go on a
  clear-out of unused packages), and most people aren't interested in the bpf
  subsystem specifically; they would rather be able to skip those tests.
I feel that as long as they know they are skipping some tests (so e.g. they
  won't consider it a sufficient test of a kselftest refactor), that's fine.
It's not even as though all of the bpf tests require clang; the (smaller) tests
  written directly in raw eBPF instructions could still be run on such a system.
  So I think we should attempt to run as much as possible but accept that clang
  may not be available and have an option to skip some tests in that case.


imo the value of selftests/bpf is twofold:
1. it helps bpf developers avoid regressions
2. as part of continuous integration it helps to catch bpf regressions
that were somehow caused by changes in other parts of the kernel

If a developer didn't bother to satisfy all bpf tests dependencies
(which includes clang) and ran all tests before sending a patch,
I don't want to see such patches. It just wastes maintainers time
to review code and spot bugs that could have been caught by tests.
Collectively we invested years of work into these tests and
developers better take advantage of it by running all.


+1


If a CI server didn't satisfy all bpf test dependencies,
I don't want such CI setup to be running and reporting results,
since it will give false sense of test coverage.
Test failures due to missing dependencies are hard failures.
We cannot skip them.


+1


Btw, on that note, the folks from zero-day bot run the BPF kselftests
for a while now just fine and they do run them together with clang,
so they have the full, proper coverage how it should be. It's not
how it used to be in the early days, you can just go and install
llvm/clang package on all the major distros today and you get the
bpf target by default enabled already.


I'd like generic XDP tests to be added to selftests/bpf which
would mean that the latest iproute2 will become a hard dependency
and bpf developers and CI host owners would need to upgrade
their iproute2.
The tests either pass or fail. Skipping them due to missing
dependencies is the same as fail and in that sense I don't want
to change selftests/bpf/Makefile to make it skip clang.


I fully agree that for the BPF selftests it is very desirable
to not only test the verifier with couple of BPF insn snippets,
but to actually load and run programs that more closely resemble
real world programs. For more complex interactions these snippets
are just limited, think of tail calls, testing perf event output
helper, etc, which would all require to write these tests with
restricted C when we add them (unless we want to make writing
these tests a real pain ;) in which case no-one will bother to
write tests at all for them). Mid to long term I would definitely
like to see more programs in BPF selftests (e.g. moved over from
samples/bpf/) to increase the test coverage.




Re: selftests/bpf doesn't compile

2017-09-15 Thread Daniel Borkmann

On 09/15/2017 08:23 PM, Daniel Borkmann wrote:

On 09/15/2017 08:07 PM, Alexei Starovoitov wrote:

On Fri, Sep 15, 2017 at 05:58:40PM +0100, Edward Cree wrote:

On 15/09/17 17:02, Alexei Starovoitov wrote:

On Thu, Sep 14, 2017 at 09:33:48AM -0600, Shuah Khan wrote:

Is bpf test intended to be run in kselftest run? The clang dependency might
not be met on majority of the systems. Is this a hard dependency??

It is a hard dependency and clang should be present on majority of the systems.

I think this is the wrong approach.  Making kselftest hard-require clang doesn't
  mean that the bpf tests will be run more often, it means that the rest of the
  kselftests will be run less often.  clang is quite big (when I tried to 
install
  it on one of my test servers, I didn't have enough disk space & had to go on a
  clear-out of unused packages), and most people aren't interested in the bpf
  subsystem specifically; they would rather be able to skip those tests.
I feel that as long as they know they are skipping some tests (so e.g. they
  won't consider it a sufficient test of a kselftest refactor), that's fine.
It's not even as though all of the bpf tests require clang; the (smaller) tests
  written directly in raw eBPF instructions could still be run on such a system.
  So I think we should attempt to run as much as possible but accept that clang
  may not be available and have an option to skip some tests in that case.


imo the value of selftests/bpf is twofold:
1. it helps bpf developers avoid regressions
2. as part of continuous integration it helps to catch bpf regressions
that were somehow caused by changes in other parts of the kernel

If a developer didn't bother to satisfy all bpf tests dependencies
(which includes clang) and ran all tests before sending a patch,
I don't want to see such patches. It just wastes maintainers time
to review code and spot bugs that could have been caught by tests.
Collectively we invested years of work into these tests and
developers better take advantage of it by running all.


+1


If a CI server didn't satisfy all bpf test dependencies,
I don't want such CI setup to be running and reporting results,
since it will give false sense of test coverage.
Test failures due to missing dependencies are hard failures.
We cannot skip them.


+1


Btw, on that note, the folks from zero-day bot run the BPF kselftests
for a while now just fine and they do run them together with clang,
so they have the full, proper coverage how it should be. It's not
how it used to be in the early days, you can just go and install
llvm/clang package on all the major distros today and you get the
bpf target by default enabled already.


I'd like generic XDP tests to be added to selftests/bpf which
would mean that the latest iproute2 will become a hard dependency
and bpf developers and CI host owners would need to upgrade
their iproute2.
The tests either pass or fail. Skipping them due to missing
dependencies is the same as fail and in that sense I don't want
to change selftests/bpf/Makefile to make it skip clang.


I fully agree that for the BPF selftests it is very desirable
to not only test the verifier with couple of BPF insn snippets,
but to actually load and run programs that more closely resemble
real world programs. For more complex interactions these snippets
are just limited, think of tail calls, testing perf event output
helper, etc, which would all require to write these tests with
restricted C when we add them (unless we want to make writing
these tests a real pain ;) in which case no-one will bother to
write tests at all for them). Mid to long term I would definitely
like to see more programs in BPF selftests (e.g. moved over from
samples/bpf/) to increase the test coverage.




Re: selftests/bpf doesn't compile

2017-09-15 Thread Daniel Borkmann

On 09/15/2017 08:07 PM, Alexei Starovoitov wrote:

On Fri, Sep 15, 2017 at 05:58:40PM +0100, Edward Cree wrote:

On 15/09/17 17:02, Alexei Starovoitov wrote:

On Thu, Sep 14, 2017 at 09:33:48AM -0600, Shuah Khan wrote:

Is bpf test intended to be run in kselftest run? The clang dependency might
not be met on majority of the systems. Is this a hard dependency??

It is a hard dependency and clang should be present on majority of the systems.

I think this is the wrong approach.  Making kselftest hard-require clang doesn't
  mean that the bpf tests will be run more often, it means that the rest of the
  kselftests will be run less often.  clang is quite big (when I tried to 
install
  it on one of my test servers, I didn't have enough disk space & had to go on a
  clear-out of unused packages), and most people aren't interested in the bpf
  subsystem specifically; they would rather be able to skip those tests.
I feel that as long as they know they are skipping some tests (so e.g. they
  won't consider it a sufficient test of a kselftest refactor), that's fine.
It's not even as though all of the bpf tests require clang; the (smaller) tests
  written directly in raw eBPF instructions could still be run on such a system.
  So I think we should attempt to run as much as possible but accept that clang
  may not be available and have an option to skip some tests in that case.


imo the value of selftests/bpf is twofold:
1. it helps bpf developers avoid regressions
2. as part of continuous integration it helps to catch bpf regressions
that were somehow caused by changes in other parts of the kernel

If a developer didn't bother to satisfy all bpf tests dependencies
(which includes clang) and ran all tests before sending a patch,
I don't want to see such patches. It just wastes maintainers time
to review code and spot bugs that could have been caught by tests.
Collectively we invested years of work into these tests and
developers better take advantage of it by running all.


+1


If a CI server didn't satisfy all bpf test dependencies,
I don't want such CI setup to be running and reporting results,
since it will give false sense of test coverage.
Test failures due to missing dependencies are hard failures.
We cannot skip them.


+1


I'd like generic XDP tests to be added to selftests/bpf which
would mean that the latest iproute2 will become a hard dependency
and bpf developers and CI host owners would need to upgrade
their iproute2.
The tests either pass or fail. Skipping them due to missing
dependencies is the same as fail and in that sense I don't want
to change selftests/bpf/Makefile to make it skip clang.


I fully agree that for the BPF selftests it is very desirable
to not only test the verifier with couple of BPF insn snippets,
but to actually load and run programs that more closely resemble
real world programs. For more complex interactions these snippets
are just limited, think of tail calls, testing perf event output
helper, etc, which would all require to write these tests with
restricted C when we add them (unless we want to make writing
these tests a real pain ;) in which case no-one will bother to
write tests at all for them). Mid to long term I would definitely
like to see more programs in BPF selftests (e.g. moved over from
samples/bpf/) to increase the test coverage.


Re: selftests/bpf doesn't compile

2017-09-15 Thread Daniel Borkmann

On 09/15/2017 08:07 PM, Alexei Starovoitov wrote:

On Fri, Sep 15, 2017 at 05:58:40PM +0100, Edward Cree wrote:

On 15/09/17 17:02, Alexei Starovoitov wrote:

On Thu, Sep 14, 2017 at 09:33:48AM -0600, Shuah Khan wrote:

Is bpf test intended to be run in kselftest run? The clang dependency might
not be met on majority of the systems. Is this a hard dependency??

It is a hard dependency and clang should be present on majority of the systems.

I think this is the wrong approach.  Making kselftest hard-require clang doesn't
  mean that the bpf tests will be run more often, it means that the rest of the
  kselftests will be run less often.  clang is quite big (when I tried to 
install
  it on one of my test servers, I didn't have enough disk space & had to go on a
  clear-out of unused packages), and most people aren't interested in the bpf
  subsystem specifically; they would rather be able to skip those tests.
I feel that as long as they know they are skipping some tests (so e.g. they
  won't consider it a sufficient test of a kselftest refactor), that's fine.
It's not even as though all of the bpf tests require clang; the (smaller) tests
  written directly in raw eBPF instructions could still be run on such a system.
  So I think we should attempt to run as much as possible but accept that clang
  may not be available and have an option to skip some tests in that case.


imo the value of selftests/bpf is twofold:
1. it helps bpf developers avoid regressions
2. as part of continuous integration it helps to catch bpf regressions
that were somehow caused by changes in other parts of the kernel

If a developer didn't bother to satisfy all bpf tests dependencies
(which includes clang) and ran all tests before sending a patch,
I don't want to see such patches. It just wastes maintainers time
to review code and spot bugs that could have been caught by tests.
Collectively we invested years of work into these tests and
developers better take advantage of it by running all.


+1


If a CI server didn't satisfy all bpf test dependencies,
I don't want such CI setup to be running and reporting results,
since it will give false sense of test coverage.
Test failures due to missing dependencies are hard failures.
We cannot skip them.


+1


I'd like generic XDP tests to be added to selftests/bpf which
would mean that the latest iproute2 will become a hard dependency
and bpf developers and CI host owners would need to upgrade
their iproute2.
The tests either pass or fail. Skipping them due to missing
dependencies is the same as fail and in that sense I don't want
to change selftests/bpf/Makefile to make it skip clang.


I fully agree that for the BPF selftests it is very desirable
to not only test the verifier with couple of BPF insn snippets,
but to actually load and run programs that more closely resemble
real world programs. For more complex interactions these snippets
are just limited, think of tail calls, testing perf event output
helper, etc, which would all require to write these tests with
restricted C when we add them (unless we want to make writing
these tests a real pain ;) in which case no-one will bother to
write tests at all for them). Mid to long term I would definitely
like to see more programs in BPF selftests (e.g. moved over from
samples/bpf/) to increase the test coverage.


Re: [PATCH] selftests/bpf: Make bpf_util work on uniprocessor systems

2017-09-08 Thread Daniel Borkmann

On 09/09/2017 01:01 AM, Alexei Starovoitov wrote:

On Fri, Sep 08, 2017 at 01:19:23PM +0200, Thomas Meyer wrote:

The current implementation fails to work on uniprocessor systems.
Fix the parser to also handle the uniprocessor case.

Signed-off-by: Thomas Meyer <tho...@m3y3r.de>


Thanks for the fix. lgtm
Acked-by: Alexei Starovoitov <a...@kernel.org>


Looks good from here as well:

Acked-by: Daniel Borkmann <dan...@iogearbox.net>


This time it's ok to go via selftest tree, but next time please use net-next/net
to avoid conflicts.


+1


Thanks


---
  tools/testing/selftests/bpf/bpf_util.h | 17 +
  1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/bpf/bpf_util.h 
b/tools/testing/selftests/bpf/bpf_util.h
index 20ecbaa0d85d..6c53a8906eff 100644
--- a/tools/testing/selftests/bpf/bpf_util.h
+++ b/tools/testing/selftests/bpf/bpf_util.h
@@ -12,6 +12,7 @@ static inline unsigned int bpf_num_possible_cpus(void)
unsigned int start, end, possible_cpus = 0;
char buff[128];
FILE *fp;
+   int n;

fp = fopen(fcpu, "r");
if (!fp) {
@@ -20,17 +21,17 @@ static inline unsigned int bpf_num_possible_cpus(void)
}

while (fgets(buff, sizeof(buff), fp)) {
-   if (sscanf(buff, "%u-%u", , ) == 2) {
-   possible_cpus = start == 0 ? end + 1 : 0;
-   break;
+   n = sscanf(buff, "%u-%u", , );
+   if (n == 0) {
+   printf("Failed to retrieve # possible CPUs!\n");
+   exit(1);
+   } else if (n == 1) {
+   end = start;
}
+   possible_cpus = start == 0 ? end + 1 : 0;
+   break;
}
-
fclose(fp);
-   if (!possible_cpus) {
-   printf("Failed to retrieve # possible CPUs!\n");
-   exit(1);
-   }

return possible_cpus;
  }
--
2.11.0





Re: [PATCH] selftests/bpf: Make bpf_util work on uniprocessor systems

2017-09-08 Thread Daniel Borkmann

On 09/09/2017 01:01 AM, Alexei Starovoitov wrote:

On Fri, Sep 08, 2017 at 01:19:23PM +0200, Thomas Meyer wrote:

The current implementation fails to work on uniprocessor systems.
Fix the parser to also handle the uniprocessor case.

Signed-off-by: Thomas Meyer 


Thanks for the fix. lgtm
Acked-by: Alexei Starovoitov 


Looks good from here as well:

Acked-by: Daniel Borkmann 


This time it's ok to go via selftest tree, but next time please use net-next/net
to avoid conflicts.


+1


Thanks


---
  tools/testing/selftests/bpf/bpf_util.h | 17 +
  1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/bpf/bpf_util.h 
b/tools/testing/selftests/bpf/bpf_util.h
index 20ecbaa0d85d..6c53a8906eff 100644
--- a/tools/testing/selftests/bpf/bpf_util.h
+++ b/tools/testing/selftests/bpf/bpf_util.h
@@ -12,6 +12,7 @@ static inline unsigned int bpf_num_possible_cpus(void)
unsigned int start, end, possible_cpus = 0;
char buff[128];
FILE *fp;
+   int n;

fp = fopen(fcpu, "r");
if (!fp) {
@@ -20,17 +21,17 @@ static inline unsigned int bpf_num_possible_cpus(void)
}

while (fgets(buff, sizeof(buff), fp)) {
-   if (sscanf(buff, "%u-%u", , ) == 2) {
-   possible_cpus = start == 0 ? end + 1 : 0;
-   break;
+   n = sscanf(buff, "%u-%u", , );
+   if (n == 0) {
+   printf("Failed to retrieve # possible CPUs!\n");
+   exit(1);
+   } else if (n == 1) {
+   end = start;
}
+   possible_cpus = start == 0 ? end + 1 : 0;
+   break;
}
-
fclose(fp);
-   if (!possible_cpus) {
-   printf("Failed to retrieve # possible CPUs!\n");
-   exit(1);
-   }

return possible_cpus;
  }
--
2.11.0





Re: [PATCH v2 2/2] selftests: bpf: test_kmod.sh: use modprobe on target device

2017-09-07 Thread Daniel Borkmann

On 09/07/2017 10:19 AM, naresh.kamb...@linaro.org wrote:

From: Naresh Kamboju <naresh.kamb...@linaro.org>

on ARM and ARM64 devices kernel source tree is not available so
insmod "$SRC_TREE/lib/test_bpf.ko" is not working.

on these target devices the test_bpf.ko is installed under
/lib/modules/`uname -r`/kernel/lib/
so use modprobe dry run to check for missing test_bpf.ko module and
insert for testing.

Signed-off-by: Naresh Kamboju <naresh.kamb...@linaro.org>


Acked-by: Daniel Borkmann <dan...@iogearbox.net>

One really small nit that could probably be fixed up along the
way when applying:


---
  tools/testing/selftests/bpf/test_kmod.sh | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_kmod.sh 
b/tools/testing/selftests/bpf/test_kmod.sh
index a53eb1cb54ef..eab9a970d742 100755
--- a/tools/testing/selftests/bpf/test_kmod.sh
+++ b/tools/testing/selftests/bpf/test_kmod.sh
@@ -14,6 +14,16 @@ test_run()
if [ $? -ne 0 ]; then
rc=1
fi
+else


Looks like a whitespace slipped in right before the "else",
so should be removed to only habe the tab indent.


+   # Use modprobe dry run to check for missing test_bpf module
+   if ! /sbin/modprobe -q -n test_bpf; then
+   echo "test_bpf: [SKIP]"
+   elif /sbin/modprobe -q test_bpf; then
+   echo "test_bpf: ok"
+   else
+   echo "test_bpf: [FAIL]"
+   rc=1
+   fi
fi
rmmod  test_bpf 2> /dev/null
dmesg | grep FAIL





Re: [PATCH v2 2/2] selftests: bpf: test_kmod.sh: use modprobe on target device

2017-09-07 Thread Daniel Borkmann

On 09/07/2017 10:19 AM, naresh.kamb...@linaro.org wrote:

From: Naresh Kamboju 

on ARM and ARM64 devices kernel source tree is not available so
insmod "$SRC_TREE/lib/test_bpf.ko" is not working.

on these target devices the test_bpf.ko is installed under
/lib/modules/`uname -r`/kernel/lib/
so use modprobe dry run to check for missing test_bpf.ko module and
insert for testing.

Signed-off-by: Naresh Kamboju 


Acked-by: Daniel Borkmann 

One really small nit that could probably be fixed up along the
way when applying:


---
  tools/testing/selftests/bpf/test_kmod.sh | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_kmod.sh 
b/tools/testing/selftests/bpf/test_kmod.sh
index a53eb1cb54ef..eab9a970d742 100755
--- a/tools/testing/selftests/bpf/test_kmod.sh
+++ b/tools/testing/selftests/bpf/test_kmod.sh
@@ -14,6 +14,16 @@ test_run()
if [ $? -ne 0 ]; then
rc=1
fi
+else


Looks like a whitespace slipped in right before the "else",
so should be removed to only habe the tab indent.


+   # Use modprobe dry run to check for missing test_bpf module
+   if ! /sbin/modprobe -q -n test_bpf; then
+   echo "test_bpf: [SKIP]"
+   elif /sbin/modprobe -q test_bpf; then
+   echo "test_bpf: ok"
+   else
+   echo "test_bpf: [FAIL]"
+   rc=1
+   fi
fi
rmmod  test_bpf 2> /dev/null
dmesg | grep FAIL





Re: [PATCH v2 1/2] selftests: bpf: test_kmod.sh: check if module is present in the path before insert

2017-09-07 Thread Daniel Borkmann

On 09/07/2017 10:19 AM, naresh.kamb...@linaro.org wrote:

From: Naresh Kamboju <naresh.kamb...@linaro.org>

The test script works when kernel source and build module test_bpf.ko
present on the machine. This patch will check if module is present in
the path.

Signed-off-by: Naresh Kamboju <naresh.kamb...@linaro.org>


Looks good, what changed between v1 and v2? Didn't get the cover
letter in case there was one. ;)

Which tree are you targeting? There are usually a lot of changes
in BPF selftests going the usual route via net and net-next tree
as we often require to put test cases along the BPF patches. Given
the merge window now and given one can regard it as a fix, it's
net tree. I'm also ok if Shuah wants to pick it up this window as
test_kmod.sh hasn't been changed in quite a while, so no merge
conflicts expected.

Anyway, for the patch:

Acked-by: Daniel Borkmann <dan...@iogearbox.net>

Thanks!


---
  tools/testing/selftests/bpf/test_kmod.sh | 8 +---
  1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_kmod.sh 
b/tools/testing/selftests/bpf/test_kmod.sh
index 6d58cca8e235..a53eb1cb54ef 100755
--- a/tools/testing/selftests/bpf/test_kmod.sh
+++ b/tools/testing/selftests/bpf/test_kmod.sh
@@ -9,9 +9,11 @@ test_run()

echo "[ JIT enabled:$1 hardened:$2 ]"
dmesg -C
-   insmod $SRC_TREE/lib/test_bpf.ko 2> /dev/null
-   if [ $? -ne 0 ]; then
-   rc=1
+   if [ -f $SRC_TREE/lib/test_bpf.ko ]; then
+   insmod $SRC_TREE/lib/test_bpf.ko 2> /dev/null
+   if [ $? -ne 0 ]; then
+   rc=1
+   fi
fi
rmmod  test_bpf 2> /dev/null
dmesg | grep FAIL





Re: [PATCH v2 1/2] selftests: bpf: test_kmod.sh: check if module is present in the path before insert

2017-09-07 Thread Daniel Borkmann

On 09/07/2017 10:19 AM, naresh.kamb...@linaro.org wrote:

From: Naresh Kamboju 

The test script works when kernel source and build module test_bpf.ko
present on the machine. This patch will check if module is present in
the path.

Signed-off-by: Naresh Kamboju 


Looks good, what changed between v1 and v2? Didn't get the cover
letter in case there was one. ;)

Which tree are you targeting? There are usually a lot of changes
in BPF selftests going the usual route via net and net-next tree
as we often require to put test cases along the BPF patches. Given
the merge window now and given one can regard it as a fix, it's
net tree. I'm also ok if Shuah wants to pick it up this window as
test_kmod.sh hasn't been changed in quite a while, so no merge
conflicts expected.

Anyway, for the patch:

Acked-by: Daniel Borkmann 

Thanks!


---
  tools/testing/selftests/bpf/test_kmod.sh | 8 +---
  1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_kmod.sh 
b/tools/testing/selftests/bpf/test_kmod.sh
index 6d58cca8e235..a53eb1cb54ef 100755
--- a/tools/testing/selftests/bpf/test_kmod.sh
+++ b/tools/testing/selftests/bpf/test_kmod.sh
@@ -9,9 +9,11 @@ test_run()

echo "[ JIT enabled:$1 hardened:$2 ]"
dmesg -C
-   insmod $SRC_TREE/lib/test_bpf.ko 2> /dev/null
-   if [ $? -ne 0 ]; then
-   rc=1
+   if [ -f $SRC_TREE/lib/test_bpf.ko ]; then
+   insmod $SRC_TREE/lib/test_bpf.ko 2> /dev/null
+   if [ $? -ne 0 ]; then
+   rc=1
+   fi
fi
rmmod  test_bpf 2> /dev/null
dmesg | grep FAIL





Re: [PATCH][net-next][V3] bpf: test_maps: fix typos, "conenct" and "listeen"

2017-08-30 Thread Daniel Borkmann

On 08/30/2017 10:13 PM, Shuah Khan wrote:

On 08/30/2017 12:47 PM, Daniel Borkmann wrote:

On 08/30/2017 07:15 PM, Colin King wrote:

From: Colin Ian King <colin.k...@canonical.com>

Trivial fix to typos in printf error messages:
"conenct" -> "connect"
"listeen" -> "listen"

thanks to Daniel Borkmann for spotting one of these mistakes

Signed-off-by: Colin Ian King <colin.k...@canonical.com>


Acked-by: Daniel Borkmann <dan...@iogearbox.net>


I can get this into 4.14-rc1 unless it should go through net-next
for dependencies. In which case,


Yeah, it does depends on work sitting in net-next so easier
to go that route.

Thanks,
Daniel


Re: [PATCH][net-next][V3] bpf: test_maps: fix typos, "conenct" and "listeen"

2017-08-30 Thread Daniel Borkmann

On 08/30/2017 10:13 PM, Shuah Khan wrote:

On 08/30/2017 12:47 PM, Daniel Borkmann wrote:

On 08/30/2017 07:15 PM, Colin King wrote:

From: Colin Ian King 

Trivial fix to typos in printf error messages:
"conenct" -> "connect"
"listeen" -> "listen"

thanks to Daniel Borkmann for spotting one of these mistakes

Signed-off-by: Colin Ian King 


Acked-by: Daniel Borkmann 


I can get this into 4.14-rc1 unless it should go through net-next
for dependencies. In which case,


Yeah, it does depends on work sitting in net-next so easier
to go that route.

Thanks,
Daniel


Re: [PATCH][net-next][V3] bpf: test_maps: fix typos, "conenct" and "listeen"

2017-08-30 Thread Daniel Borkmann

On 08/30/2017 07:15 PM, Colin King wrote:

From: Colin Ian King <colin.k...@canonical.com>

Trivial fix to typos in printf error messages:
"conenct" -> "connect"
"listeen" -> "listen"

thanks to Daniel Borkmann for spotting one of these mistakes

Signed-off-by: Colin Ian King <colin.k...@canonical.com>


Acked-by: Daniel Borkmann <dan...@iogearbox.net>


Re: [PATCH][net-next][V3] bpf: test_maps: fix typos, "conenct" and "listeen"

2017-08-30 Thread Daniel Borkmann

On 08/30/2017 07:15 PM, Colin King wrote:

From: Colin Ian King 

Trivial fix to typos in printf error messages:
"conenct" -> "connect"
"listeen" -> "listen"

thanks to Daniel Borkmann for spotting one of these mistakes

Signed-off-by: Colin Ian King 


Acked-by: Daniel Borkmann 


Re: [PATCH][next][V2] bpf: test_maps: fix typo "conenct" -> "connect"

2017-08-30 Thread Daniel Borkmann

On 08/30/2017 01:47 PM, Colin King wrote:

From: Colin Ian King <colin.k...@canonical.com>

Trivial fix to typo in printf error message

Signed-off-by: Colin Ian King <colin.k...@canonical.com>


For net-next; looks like there is also one in "failed to listeen\n".
Want to fix this one as well ? ;)

Acked-by: Daniel Borkmann <dan...@iogearbox.net>


Re: [PATCH][next][V2] bpf: test_maps: fix typo "conenct" -> "connect"

2017-08-30 Thread Daniel Borkmann

On 08/30/2017 01:47 PM, Colin King wrote:

From: Colin Ian King 

Trivial fix to typo in printf error message

Signed-off-by: Colin Ian King 


For net-next; looks like there is also one in "failed to listeen\n".
Want to fix this one as well ? ;)

Acked-by: Daniel Borkmann 


Re: [PATCH][next] MIPS,bpf: fix missing break in switch statement

2017-08-22 Thread Daniel Borkmann

On 08/23/2017 12:29 AM, David Daney wrote:

On 08/22/2017 03:03 PM, Colin King wrote:

From: Colin Ian King 

There is a missing break causing a fall-through and setting
ctx.use_bbit_insns to the wrong value. Fix this by adding the
missing break.

Detected with cppcheck:
"Variable 'ctx.use_bbit_insns' is reassigned a value before the old
one has been used. 'break;' missing?"

Fixes: 8d8d18c3283f ("MIPS,bpf: Fix using smp_processor_id() in preemptible 
splat.")
Signed-off-by: Colin Ian King 


Crap!  That slipped through.  Thanks for fixing it.

Tested and ...

Acked-by: David Daney 


Colin, can you send this with David's ACK to netdev in Cc
so it lands in patchwork? It's for net-next tree. Thanks!


Re: [PATCH][next] MIPS,bpf: fix missing break in switch statement

2017-08-22 Thread Daniel Borkmann

On 08/23/2017 12:29 AM, David Daney wrote:

On 08/22/2017 03:03 PM, Colin King wrote:

From: Colin Ian King 

There is a missing break causing a fall-through and setting
ctx.use_bbit_insns to the wrong value. Fix this by adding the
missing break.

Detected with cppcheck:
"Variable 'ctx.use_bbit_insns' is reassigned a value before the old
one has been used. 'break;' missing?"

Fixes: 8d8d18c3283f ("MIPS,bpf: Fix using smp_processor_id() in preemptible 
splat.")
Signed-off-by: Colin Ian King 


Crap!  That slipped through.  Thanks for fixing it.

Tested and ...

Acked-by: David Daney 


Colin, can you send this with David's ACK to netdev in Cc
so it lands in patchwork? It's for net-next tree. Thanks!


Re: [PATCH v4 net-next] arm: eBPF JIT compiler

2017-08-22 Thread Daniel Borkmann

On 08/22/2017 05:08 PM, Daniel Borkmann wrote:

On 08/22/2017 08:36 AM, Shubham Bansal wrote:
[...]

+
+static int out_offset = -1; /* initialized on the first pass of build_body() */


Hm, why is this a global var actually? There can be
multiple parallel calls to bpf_int_jit_compile(), we
don't take a global lock on this. Unless I'm missing
something this should really reside in jit_ctx, no?


Hm, okay, it's for generating the out jmp offsets in
tail call emission which are supposed to always be the
same relative offsets; should be fine then.


Given this is on emit_bpf_tail_call(), did you get
tail calls working the way I suggested to test?


+static int emit_bpf_tail_call(struct jit_ctx *ctx)
  {

[...]

+const int idx0 = ctx->idx;
+#define cur_offset (ctx->idx - idx0)
+#define jmp_offset (out_offset - (cur_offset))

[...]

+
+/* out: */
+if (out_offset == -1)
+out_offset = cur_offset;
+if (cur_offset != out_offset) {
+pr_err_once("tail_call out_offset = %d, expected %d!\n",
+cur_offset, out_offset);
+return -1;
+}
+return 0;
+#undef cur_offset
+#undef jmp_offset
  }




Re: [PATCH v4 net-next] arm: eBPF JIT compiler

2017-08-22 Thread Daniel Borkmann

On 08/22/2017 05:08 PM, Daniel Borkmann wrote:

On 08/22/2017 08:36 AM, Shubham Bansal wrote:
[...]

+
+static int out_offset = -1; /* initialized on the first pass of build_body() */


Hm, why is this a global var actually? There can be
multiple parallel calls to bpf_int_jit_compile(), we
don't take a global lock on this. Unless I'm missing
something this should really reside in jit_ctx, no?


Hm, okay, it's for generating the out jmp offsets in
tail call emission which are supposed to always be the
same relative offsets; should be fine then.


Given this is on emit_bpf_tail_call(), did you get
tail calls working the way I suggested to test?


+static int emit_bpf_tail_call(struct jit_ctx *ctx)
  {

[...]

+const int idx0 = ctx->idx;
+#define cur_offset (ctx->idx - idx0)
+#define jmp_offset (out_offset - (cur_offset))

[...]

+
+/* out: */
+if (out_offset == -1)
+out_offset = cur_offset;
+if (cur_offset != out_offset) {
+pr_err_once("tail_call out_offset = %d, expected %d!\n",
+cur_offset, out_offset);
+return -1;
+}
+return 0;
+#undef cur_offset
+#undef jmp_offset
  }




Re: [PATCH v4 net-next] arm: eBPF JIT compiler

2017-08-22 Thread Daniel Borkmann

On 08/22/2017 08:36 AM, Shubham Bansal wrote:
[...]

+
+static int out_offset = -1; /* initialized on the first pass of build_body() */


Hm, why is this a global var actually? There can be
multiple parallel calls to bpf_int_jit_compile(), we
don't take a global lock on this. Unless I'm missing
something this should really reside in jit_ctx, no?

Given this is on emit_bpf_tail_call(), did you get
tail calls working the way I suggested to test?


+static int emit_bpf_tail_call(struct jit_ctx *ctx)
  {

[...]

+   const int idx0 = ctx->idx;
+#define cur_offset (ctx->idx - idx0)
+#define jmp_offset (out_offset - (cur_offset))

[...]

+
+   /* out: */
+   if (out_offset == -1)
+   out_offset = cur_offset;
+   if (cur_offset != out_offset) {
+   pr_err_once("tail_call out_offset = %d, expected %d!\n",
+   cur_offset, out_offset);
+   return -1;
+   }
+   return 0;
+#undef cur_offset
+#undef jmp_offset
  }


Re: [PATCH v4 net-next] arm: eBPF JIT compiler

2017-08-22 Thread Daniel Borkmann

On 08/22/2017 08:36 AM, Shubham Bansal wrote:
[...]

+
+static int out_offset = -1; /* initialized on the first pass of build_body() */


Hm, why is this a global var actually? There can be
multiple parallel calls to bpf_int_jit_compile(), we
don't take a global lock on this. Unless I'm missing
something this should really reside in jit_ctx, no?

Given this is on emit_bpf_tail_call(), did you get
tail calls working the way I suggested to test?


+static int emit_bpf_tail_call(struct jit_ctx *ctx)
  {

[...]

+   const int idx0 = ctx->idx;
+#define cur_offset (ctx->idx - idx0)
+#define jmp_offset (out_offset - (cur_offset))

[...]

+
+   /* out: */
+   if (out_offset == -1)
+   out_offset = cur_offset;
+   if (cur_offset != out_offset) {
+   pr_err_once("tail_call out_offset = %d, expected %d!\n",
+   cur_offset, out_offset);
+   return -1;
+   }
+   return 0;
+#undef cur_offset
+#undef jmp_offset
  }


Re: [PATCH v3 net-next] bpf/verifier: track liveness for pruning

2017-08-21 Thread Daniel Borkmann

On 08/21/2017 10:44 PM, Edward Cree wrote:

On 21/08/17 21:27, Daniel Borkmann wrote:

On 08/21/2017 08:36 PM, Edward Cree wrote:

On 19/08/17 00:37, Alexei Starovoitov wrote:

[...]

I'm tempted to just rip out env->varlen_map_value_access and always check
   the whole thing, because honestly I don't know what it was meant to do
   originally or how it can ever do any useful pruning.  While drastic, it
   does cause your test case to pass.


Original intention from 484611357c19 ("bpf: allow access into map
value arrays") was that it wouldn't potentially make pruning worse
if PTR_TO_MAP_VALUE_ADJ was not used, meaning that we wouldn't need
to take reg state's min_value and max_value into account for state
checking; this was basically due to min_value / max_value is being
adjusted/tracked on every alu/jmp ops for involved regs (e.g.
adjust_reg_min_max_vals() and others that mangle them) even if we
have the case that no actual dynamic map access is used throughout
the program. To give an example on net tree, the bpf_lxc.o prog's
section increases from 36,386 to 68,226 when env->varlen_map_value_access
is always true, so it does have an effect. Did you do some checks
on this on net-next?

I tested with the cilium progs and saw no change in insn count.  I
  suspect that for the normal case I already killed this optimisation
  when I did my unification patch, it was previously about ignoring
  min/max values on all regs (including scalars), whereas on net-next
  it only ignores them on map_value pointers; in practice this is
  useless because we tend to still have the offset scalar sitting in
  a register somewhere.  (Come to think of it, this may have been
  behind a large chunk of the #insn increase that my patches caused.)


Yeah, this would seem plausible.


Since we use umax_value in find_good_pkt_pointers() now (to check
  against MAX_PACKET_OFF and ensure our reg->range is really ok), we
  can't just stop caring about all min/max values just because we
  haven't done any variable map accesses.
I don't see a way around this.


Agree, was thinking the same. If there's not really a regression in
terms of complexity, then lets kill the flag.


Re: [PATCH v3 net-next] bpf/verifier: track liveness for pruning

2017-08-21 Thread Daniel Borkmann

On 08/21/2017 10:44 PM, Edward Cree wrote:

On 21/08/17 21:27, Daniel Borkmann wrote:

On 08/21/2017 08:36 PM, Edward Cree wrote:

On 19/08/17 00:37, Alexei Starovoitov wrote:

[...]

I'm tempted to just rip out env->varlen_map_value_access and always check
   the whole thing, because honestly I don't know what it was meant to do
   originally or how it can ever do any useful pruning.  While drastic, it
   does cause your test case to pass.


Original intention from 484611357c19 ("bpf: allow access into map
value arrays") was that it wouldn't potentially make pruning worse
if PTR_TO_MAP_VALUE_ADJ was not used, meaning that we wouldn't need
to take reg state's min_value and max_value into account for state
checking; this was basically due to min_value / max_value is being
adjusted/tracked on every alu/jmp ops for involved regs (e.g.
adjust_reg_min_max_vals() and others that mangle them) even if we
have the case that no actual dynamic map access is used throughout
the program. To give an example on net tree, the bpf_lxc.o prog's
section increases from 36,386 to 68,226 when env->varlen_map_value_access
is always true, so it does have an effect. Did you do some checks
on this on net-next?

I tested with the cilium progs and saw no change in insn count.  I
  suspect that for the normal case I already killed this optimisation
  when I did my unification patch, it was previously about ignoring
  min/max values on all regs (including scalars), whereas on net-next
  it only ignores them on map_value pointers; in practice this is
  useless because we tend to still have the offset scalar sitting in
  a register somewhere.  (Come to think of it, this may have been
  behind a large chunk of the #insn increase that my patches caused.)


Yeah, this would seem plausible.


Since we use umax_value in find_good_pkt_pointers() now (to check
  against MAX_PACKET_OFF and ensure our reg->range is really ok), we
  can't just stop caring about all min/max values just because we
  haven't done any variable map accesses.
I don't see a way around this.


Agree, was thinking the same. If there's not really a regression in
terms of complexity, then lets kill the flag.


Re: [PATCH v3 net-next] bpf/verifier: track liveness for pruning

2017-08-21 Thread Daniel Borkmann

On 08/21/2017 08:36 PM, Edward Cree wrote:

On 19/08/17 00:37, Alexei Starovoitov wrote:

[...]

I'm tempted to just rip out env->varlen_map_value_access and always check
  the whole thing, because honestly I don't know what it was meant to do
  originally or how it can ever do any useful pruning.  While drastic, it
  does cause your test case to pass.


Original intention from 484611357c19 ("bpf: allow access into map
value arrays") was that it wouldn't potentially make pruning worse
if PTR_TO_MAP_VALUE_ADJ was not used, meaning that we wouldn't need
to take reg state's min_value and max_value into account for state
checking; this was basically due to min_value / max_value is being
adjusted/tracked on every alu/jmp ops for involved regs (e.g.
adjust_reg_min_max_vals() and others that mangle them) even if we
have the case that no actual dynamic map access is used throughout
the program. To give an example on net tree, the bpf_lxc.o prog's
section increases from 36,386 to 68,226 when env->varlen_map_value_access
is always true, so it does have an effect. Did you do some checks
on this on net-next?


Re: [PATCH v3 net-next] bpf/verifier: track liveness for pruning

2017-08-21 Thread Daniel Borkmann

On 08/21/2017 08:36 PM, Edward Cree wrote:

On 19/08/17 00:37, Alexei Starovoitov wrote:

[...]

I'm tempted to just rip out env->varlen_map_value_access and always check
  the whole thing, because honestly I don't know what it was meant to do
  originally or how it can ever do any useful pruning.  While drastic, it
  does cause your test case to pass.


Original intention from 484611357c19 ("bpf: allow access into map
value arrays") was that it wouldn't potentially make pruning worse
if PTR_TO_MAP_VALUE_ADJ was not used, meaning that we wouldn't need
to take reg state's min_value and max_value into account for state
checking; this was basically due to min_value / max_value is being
adjusted/tracked on every alu/jmp ops for involved regs (e.g.
adjust_reg_min_max_vals() and others that mangle them) even if we
have the case that no actual dynamic map access is used throughout
the program. To give an example on net tree, the bpf_lxc.o prog's
section increases from 36,386 to 68,226 when env->varlen_map_value_access
is always true, so it does have an effect. Did you do some checks
on this on net-next?


Re: [PATCH 0/3] MIPS,bpf: Improvements for MIPS eBPF JIT

2017-08-21 Thread Daniel Borkmann

On 08/21/2017 05:06 AM, David Miller wrote:

From: David Daney 
Date: Fri, 18 Aug 2017 16:40:30 -0700


I suggest that the whole thing go via the BPF/net-next path as there
are dependencies on code that is not yet merged to Linus' tree.


What kind of dependency?  On networking or MIPS changes?


On networking, David implemented the JLT, JLE, JSLT and JSLE
ops for the JIT in the patch set. Back then the MIPS JIT wasn't
in net-next tree, thus this is basically just a follow-up, so
that we have all covered with JIT support for net-next.

Thanks,
Daniel


Re: [PATCH 0/3] MIPS,bpf: Improvements for MIPS eBPF JIT

2017-08-21 Thread Daniel Borkmann

On 08/21/2017 05:06 AM, David Miller wrote:

From: David Daney 
Date: Fri, 18 Aug 2017 16:40:30 -0700


I suggest that the whole thing go via the BPF/net-next path as there
are dependencies on code that is not yet merged to Linus' tree.


What kind of dependency?  On networking or MIPS changes?


On networking, David implemented the JLT, JLE, JSLT and JSLE
ops for the JIT in the patch set. Back then the MIPS JIT wasn't
in net-next tree, thus this is basically just a follow-up, so
that we have all covered with JIT support for net-next.

Thanks,
Daniel


Re: [PATCH net-next v3] arm: eBPF JIT compiler

2017-08-20 Thread Daniel Borkmann

On 08/19/2017 11:20 AM, Shubham Bansal wrote:
[...]

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 61a0cb1..cc31f8b 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -50,7 +50,7 @@ config ARM
select HAVE_ARCH_SECCOMP_FILTER if (AEABI && !OABI_COMPAT)
select HAVE_ARCH_TRACEHOOK
select HAVE_ARM_SMCCC if CPU_V7
-   select HAVE_CBPF_JIT
+   select HAVE_EBPF_JIT
select HAVE_CC_STACKPROTECTOR
select HAVE_CONTEXT_TRACKING
select HAVE_C_RECORDMCOUNT
diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index d5b9fa1..ea7d079 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -1,6 +1,7 @@
  /*
- * Just-In-Time compiler for BPF filters on 32bit ARM
+ * Just-In-Time compiler for eBPF filters on 32bit ARM
   *
+ * Copyright (c) 2017 Shubham Bansal 
   * Copyright (c) 2011 Mircea Gherzan 
   *
   * This program is free software; you can redistribute it and/or modify it
@@ -8,6 +9,7 @@
   * Free Software Foundation; version 2 of the License.
   */

+#include 
  #include 
  #include 
  #include 
@@ -18,50 +20,96 @@
  #include 

  #include 
-#include 
  #include 
  #include 

  #include "bpf_jit_32.h"

+int bpf_jit_enable __read_mostly;


[...]

With the below #ifdef __LITTLE_ENDIAN spanning the entire
bpf_int_jit_compile(), a user can then enable and compile
eBPF JIT for big endian, even set the bpf_jit_enable to 1
to turn it on, but it won't JIT anything, which is contrary
to the expectation.

This should rather be a hard dependency in the Kconfig, if
I got it correctly, expressed as e.g.

select HAVE_EBPF_JIT if !CPU_ENDIAN_BE32


+struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
  {
+#ifdef __LITTLE_ENDIAN
+   struct bpf_prog *tmp, *orig_prog = prog;
struct bpf_binary_header *header;
+   bool tmp_blinded = false;
struct jit_ctx ctx;
-   unsigned tmp_idx;
-   unsigned alloc_size;
-   u8 *target_ptr;
+   unsigned int tmp_idx;
+   unsigned int image_size;
+   u8 *image_ptr;

+   /* If BPF JIT was not enabled then we must fall back to
+* the interpreter.
+*/
if (!bpf_jit_enable)
-   return;
+   return orig_prog;

-   memset(, 0, sizeof(ctx));
-   ctx.skf = fp;
-   ctx.ret0_fp_idx = -1;
+   /* If constant blinding was enabled and we failed during blinding
+* then we must fall back to the interpreter. Otherwise, we save
+* the new JITed code.
+*/
+   tmp = bpf_jit_blind_constants(prog);

-   ctx.offsets = kzalloc(4 * (ctx.skf->len + 1), GFP_KERNEL);
-   if (ctx.offsets == NULL)
-   return;
+   if (IS_ERR(tmp))
+   return orig_prog;
+   if (tmp != prog) {
+   tmp_blinded = true;
+   prog = tmp;
+   }
+
+   memset(, 0, sizeof(ctx));
+   ctx.prog = prog;

-   /* fake pass to fill in the ctx->seen */
-   if (unlikely(build_body()))
+   /* Not able to allocate memory for offsets[] , then
+* we must fall back to the interpreter
+*/
+   ctx.offsets = kcalloc(prog->len, sizeof(int), GFP_KERNEL);
+   if (ctx.offsets == NULL) {
+   prog = orig_prog;
goto out;
+   }
+
+   /* 1) fake pass to find in the length of the JITed code,
+* to compute ctx->offsets and other context variables
+* needed to compute final JITed code.
+* Also, calculate random starting pointer/start of JITed code
+* which is prefixed by random number of fault instructions.
+*
+* If the first pass fails then there is no chance of it
+* being successful in the second pass, so just fall back
+* to the interpreter.
+*/
+   if (build_body()) {
+   prog = orig_prog;
+   goto out_off;
+   }

tmp_idx = ctx.idx;
build_prologue();
ctx.prologue_bytes = (ctx.idx - tmp_idx) * 4;

+   ctx.epilogue_offset = ctx.idx;
+
  #if __LINUX_ARM_ARCH__ < 7
tmp_idx = ctx.idx;
build_epilogue();
@@ -1021,64 +1878,98 @@ void bpf_jit_compile(struct bpf_prog *fp)

ctx.idx += ctx.imm_count;
if (ctx.imm_count) {
-   ctx.imms = kzalloc(4 * ctx.imm_count, GFP_KERNEL);
-   if (ctx.imms == NULL)
-   goto out;
+   ctx.imms = kcalloc(ctx.imm_count, sizeof(u32), GFP_KERNEL);
+   if (ctx.imms == NULL) {
+   prog = orig_prog;
+   goto out_off;
+   }
}
  #else
-   /* there's nothing after the epilogue on ARMv7 */
+   /* there's nothing about the epilogue on ARMv7 */
build_epilogue();
  #endif
-   alloc_size = 4 * ctx.idx;
-   header = bpf_jit_binary_alloc(alloc_size, _ptr,
- 4, jit_fill_hole);
-   

Re: [PATCH net-next v3] arm: eBPF JIT compiler

2017-08-20 Thread Daniel Borkmann

On 08/19/2017 11:20 AM, Shubham Bansal wrote:
[...]

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 61a0cb1..cc31f8b 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -50,7 +50,7 @@ config ARM
select HAVE_ARCH_SECCOMP_FILTER if (AEABI && !OABI_COMPAT)
select HAVE_ARCH_TRACEHOOK
select HAVE_ARM_SMCCC if CPU_V7
-   select HAVE_CBPF_JIT
+   select HAVE_EBPF_JIT
select HAVE_CC_STACKPROTECTOR
select HAVE_CONTEXT_TRACKING
select HAVE_C_RECORDMCOUNT
diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index d5b9fa1..ea7d079 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -1,6 +1,7 @@
  /*
- * Just-In-Time compiler for BPF filters on 32bit ARM
+ * Just-In-Time compiler for eBPF filters on 32bit ARM
   *
+ * Copyright (c) 2017 Shubham Bansal 
   * Copyright (c) 2011 Mircea Gherzan 
   *
   * This program is free software; you can redistribute it and/or modify it
@@ -8,6 +9,7 @@
   * Free Software Foundation; version 2 of the License.
   */

+#include 
  #include 
  #include 
  #include 
@@ -18,50 +20,96 @@
  #include 

  #include 
-#include 
  #include 
  #include 

  #include "bpf_jit_32.h"

+int bpf_jit_enable __read_mostly;


[...]

With the below #ifdef __LITTLE_ENDIAN spanning the entire
bpf_int_jit_compile(), a user can then enable and compile
eBPF JIT for big endian, even set the bpf_jit_enable to 1
to turn it on, but it won't JIT anything, which is contrary
to the expectation.

This should rather be a hard dependency in the Kconfig, if
I got it correctly, expressed as e.g.

select HAVE_EBPF_JIT if !CPU_ENDIAN_BE32


+struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
  {
+#ifdef __LITTLE_ENDIAN
+   struct bpf_prog *tmp, *orig_prog = prog;
struct bpf_binary_header *header;
+   bool tmp_blinded = false;
struct jit_ctx ctx;
-   unsigned tmp_idx;
-   unsigned alloc_size;
-   u8 *target_ptr;
+   unsigned int tmp_idx;
+   unsigned int image_size;
+   u8 *image_ptr;

+   /* If BPF JIT was not enabled then we must fall back to
+* the interpreter.
+*/
if (!bpf_jit_enable)
-   return;
+   return orig_prog;

-   memset(, 0, sizeof(ctx));
-   ctx.skf = fp;
-   ctx.ret0_fp_idx = -1;
+   /* If constant blinding was enabled and we failed during blinding
+* then we must fall back to the interpreter. Otherwise, we save
+* the new JITed code.
+*/
+   tmp = bpf_jit_blind_constants(prog);

-   ctx.offsets = kzalloc(4 * (ctx.skf->len + 1), GFP_KERNEL);
-   if (ctx.offsets == NULL)
-   return;
+   if (IS_ERR(tmp))
+   return orig_prog;
+   if (tmp != prog) {
+   tmp_blinded = true;
+   prog = tmp;
+   }
+
+   memset(, 0, sizeof(ctx));
+   ctx.prog = prog;

-   /* fake pass to fill in the ctx->seen */
-   if (unlikely(build_body()))
+   /* Not able to allocate memory for offsets[] , then
+* we must fall back to the interpreter
+*/
+   ctx.offsets = kcalloc(prog->len, sizeof(int), GFP_KERNEL);
+   if (ctx.offsets == NULL) {
+   prog = orig_prog;
goto out;
+   }
+
+   /* 1) fake pass to find in the length of the JITed code,
+* to compute ctx->offsets and other context variables
+* needed to compute final JITed code.
+* Also, calculate random starting pointer/start of JITed code
+* which is prefixed by random number of fault instructions.
+*
+* If the first pass fails then there is no chance of it
+* being successful in the second pass, so just fall back
+* to the interpreter.
+*/
+   if (build_body()) {
+   prog = orig_prog;
+   goto out_off;
+   }

tmp_idx = ctx.idx;
build_prologue();
ctx.prologue_bytes = (ctx.idx - tmp_idx) * 4;

+   ctx.epilogue_offset = ctx.idx;
+
  #if __LINUX_ARM_ARCH__ < 7
tmp_idx = ctx.idx;
build_epilogue();
@@ -1021,64 +1878,98 @@ void bpf_jit_compile(struct bpf_prog *fp)

ctx.idx += ctx.imm_count;
if (ctx.imm_count) {
-   ctx.imms = kzalloc(4 * ctx.imm_count, GFP_KERNEL);
-   if (ctx.imms == NULL)
-   goto out;
+   ctx.imms = kcalloc(ctx.imm_count, sizeof(u32), GFP_KERNEL);
+   if (ctx.imms == NULL) {
+   prog = orig_prog;
+   goto out_off;
+   }
}
  #else
-   /* there's nothing after the epilogue on ARMv7 */
+   /* there's nothing about the epilogue on ARMv7 */
build_epilogue();
  #endif
-   alloc_size = 4 * ctx.idx;
-   header = bpf_jit_binary_alloc(alloc_size, _ptr,
- 4, jit_fill_hole);
-   if (header == NULL)
-   goto 

Re: [PATCH 0/3] MIPS,bpf: Improvements for MIPS eBPF JIT

2017-08-18 Thread Daniel Borkmann

On 08/19/2017 01:40 AM, David Daney wrote:

Here are several improvements and bug fixes for the MIPS eBPF JIT.

The main change is the addition of support for JLT, JLE, JSLT and JSLE
ops, that were recently added.

Also fix WARN output when used with preemptable kernel, and a small
cleanup/optimization in the use of BPF_OP(insn->code).

I suggest that the whole thing go via the BPF/net-next path as there
are dependencies on code that is not yet merged to Linus' tree.


Yes, this would be via net-next.


Still pending are changes to reduce stack usage when the verifier can
determine the maximum stack size.


Awesome, thanks a lot!


Re: [PATCH 0/3] MIPS,bpf: Improvements for MIPS eBPF JIT

2017-08-18 Thread Daniel Borkmann

On 08/19/2017 01:40 AM, David Daney wrote:

Here are several improvements and bug fixes for the MIPS eBPF JIT.

The main change is the addition of support for JLT, JLE, JSLT and JSLE
ops, that were recently added.

Also fix WARN output when used with preemptable kernel, and a small
cleanup/optimization in the use of BPF_OP(insn->code).

I suggest that the whole thing go via the BPF/net-next path as there
are dependencies on code that is not yet merged to Linus' tree.


Yes, this would be via net-next.


Still pending are changes to reduce stack usage when the verifier can
determine the maximum stack size.


Awesome, thanks a lot!


Re: [PATCH v2] bpf: Update sysctl documentation to list all supported architectures

2017-08-17 Thread Daniel Borkmann

On 08/17/2017 12:30 PM, Michael Ellerman wrote:

The sysctl documentation states that the JIT is only available on
x86_64, which is no longer correct.

Update the list, and break it out to indicate which architectures
support the cBPF JIT (via HAVE_CBPF_JIT) or the eBPF JIT
(HAVE_EBPF_JIT).

Signed-off-by: Michael Ellerman <m...@ellerman.id.au>


The last paragraph speaking about tcpdump, I can take care of
later, also given the patch is about updating list of archs, so
lgtm, thanks!

Acked-by: Daniel Borkmann <dan...@iogearbox.net>


Re: [PATCH v2] bpf: Update sysctl documentation to list all supported architectures

2017-08-17 Thread Daniel Borkmann

On 08/17/2017 12:30 PM, Michael Ellerman wrote:

The sysctl documentation states that the JIT is only available on
x86_64, which is no longer correct.

Update the list, and break it out to indicate which architectures
support the cBPF JIT (via HAVE_CBPF_JIT) or the eBPF JIT
(HAVE_EBPF_JIT).

Signed-off-by: Michael Ellerman 


The last paragraph speaking about tcpdump, I can take care of
later, also given the patch is about updating list of archs, so
lgtm, thanks!

Acked-by: Daniel Borkmann 


Re: [PATCH] bpf: Update sysctl documentation to list all supported architectures

2017-08-16 Thread Daniel Borkmann

On 08/16/2017 01:10 PM, Michael Ellerman wrote:

Daniel Borkmann <dan...@iogearbox.net> writes:

On 08/16/2017 07:15 AM, Michael Ellerman wrote:

The sysctl documentation states that the JIT is only available on
x86_64, which is no longer correct.

Update the list to include all architectures that enable HAVE_CBPF_JIT
or HAVE_EBPF_JIT under some configuration.

Signed-off-by: Michael Ellerman <m...@ellerman.id.au>


Thanks for the patch!


   Documentation/sysctl/net.txt | 5 +++--
   1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
index 14db18c970b1..f68356024d09 100644
--- a/Documentation/sysctl/net.txt
+++ b/Documentation/sysctl/net.txt
@@ -36,8 +36,9 @@ bpf_jit_enable
   --

   This enables Berkeley Packet Filter Just in Time compiler.
-Currently supported on x86_64 architecture, bpf_jit provides a framework
-to speed packet filtering, the one used by tcpdump/libpcap for example.
+Currently supported on arm, arm64, mips, powerpc, s390, sparc and x86_64
+architectures, bpf_jit provides a framework to speed packet filtering, the one
+used by tcpdump/libpcap for example.


Good point, could we actually make that as a bullet list and
differentiate between cBPF and eBPF JITs, so that a user doesn't
need to run git grep HAVE_{E,C}BPF_JIT to figure it out what the
switch enables on the arch used? That would be great.


We could.

Does a user of the sysctl want/need to know the difference though? Or do
they just want to turn on "the JIT"?


They would just turn it on, but I think it would be nice to inform
them which archs support eBPF (which is a superset of cBPF in term
of what can be jited), so in case they have some native eBPF programs
they would see whether these can also be jited.


Re: [PATCH] bpf: Update sysctl documentation to list all supported architectures

2017-08-16 Thread Daniel Borkmann

On 08/16/2017 01:10 PM, Michael Ellerman wrote:

Daniel Borkmann  writes:

On 08/16/2017 07:15 AM, Michael Ellerman wrote:

The sysctl documentation states that the JIT is only available on
x86_64, which is no longer correct.

Update the list to include all architectures that enable HAVE_CBPF_JIT
or HAVE_EBPF_JIT under some configuration.

Signed-off-by: Michael Ellerman 


Thanks for the patch!


   Documentation/sysctl/net.txt | 5 +++--
   1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
index 14db18c970b1..f68356024d09 100644
--- a/Documentation/sysctl/net.txt
+++ b/Documentation/sysctl/net.txt
@@ -36,8 +36,9 @@ bpf_jit_enable
   --

   This enables Berkeley Packet Filter Just in Time compiler.
-Currently supported on x86_64 architecture, bpf_jit provides a framework
-to speed packet filtering, the one used by tcpdump/libpcap for example.
+Currently supported on arm, arm64, mips, powerpc, s390, sparc and x86_64
+architectures, bpf_jit provides a framework to speed packet filtering, the one
+used by tcpdump/libpcap for example.


Good point, could we actually make that as a bullet list and
differentiate between cBPF and eBPF JITs, so that a user doesn't
need to run git grep HAVE_{E,C}BPF_JIT to figure it out what the
switch enables on the arch used? That would be great.


We could.

Does a user of the sysctl want/need to know the difference though? Or do
they just want to turn on "the JIT"?


They would just turn it on, but I think it would be nice to inform
them which archs support eBPF (which is a superset of cBPF in term
of what can be jited), so in case they have some native eBPF programs
they would see whether these can also be jited.


Re: [PATCH] bpf: Update sysctl documentation to list all supported architectures

2017-08-16 Thread Daniel Borkmann

Hi Michael,

On 08/16/2017 07:15 AM, Michael Ellerman wrote:

The sysctl documentation states that the JIT is only available on
x86_64, which is no longer correct.

Update the list to include all architectures that enable HAVE_CBPF_JIT
or HAVE_EBPF_JIT under some configuration.

Signed-off-by: Michael Ellerman 


Thanks for the patch!


  Documentation/sysctl/net.txt | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
index 14db18c970b1..f68356024d09 100644
--- a/Documentation/sysctl/net.txt
+++ b/Documentation/sysctl/net.txt
@@ -36,8 +36,9 @@ bpf_jit_enable
  --

  This enables Berkeley Packet Filter Just in Time compiler.
-Currently supported on x86_64 architecture, bpf_jit provides a framework
-to speed packet filtering, the one used by tcpdump/libpcap for example.
+Currently supported on arm, arm64, mips, powerpc, s390, sparc and x86_64
+architectures, bpf_jit provides a framework to speed packet filtering, the one
+used by tcpdump/libpcap for example.


Good point, could we actually make that as a bullet list and
differentiate between cBPF and eBPF JITs, so that a user doesn't
need to run git grep HAVE_{E,C}BPF_JIT to figure it out what the
switch enables on the arch used? That would be great.

So for eBPF JITs, we have covered:

 * x86_64
 * arm64
 * ppc64
 * sparc64
 * mips64

For old cBPF, there is:

 * arm
 * mips
 * ppc
 * sparc

Thanks,
Daniel


  Values :
0 - disable the JIT (default value)
1 - enable the JIT





Re: [PATCH] bpf: Update sysctl documentation to list all supported architectures

2017-08-16 Thread Daniel Borkmann

Hi Michael,

On 08/16/2017 07:15 AM, Michael Ellerman wrote:

The sysctl documentation states that the JIT is only available on
x86_64, which is no longer correct.

Update the list to include all architectures that enable HAVE_CBPF_JIT
or HAVE_EBPF_JIT under some configuration.

Signed-off-by: Michael Ellerman 


Thanks for the patch!


  Documentation/sysctl/net.txt | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
index 14db18c970b1..f68356024d09 100644
--- a/Documentation/sysctl/net.txt
+++ b/Documentation/sysctl/net.txt
@@ -36,8 +36,9 @@ bpf_jit_enable
  --

  This enables Berkeley Packet Filter Just in Time compiler.
-Currently supported on x86_64 architecture, bpf_jit provides a framework
-to speed packet filtering, the one used by tcpdump/libpcap for example.
+Currently supported on arm, arm64, mips, powerpc, s390, sparc and x86_64
+architectures, bpf_jit provides a framework to speed packet filtering, the one
+used by tcpdump/libpcap for example.


Good point, could we actually make that as a bullet list and
differentiate between cBPF and eBPF JITs, so that a user doesn't
need to run git grep HAVE_{E,C}BPF_JIT to figure it out what the
switch enables on the arch used? That would be great.

So for eBPF JITs, we have covered:

 * x86_64
 * arm64
 * ppc64
 * sparc64
 * mips64

For old cBPF, there is:

 * arm
 * mips
 * ppc
 * sparc

Thanks,
Daniel


  Values :
0 - disable the JIT (default value)
1 - enable the JIT





Re: [PATCH v3 net-next] bpf/verifier: track liveness for pruning

2017-08-15 Thread Daniel Borkmann

On 08/15/2017 09:34 PM, Edward Cree wrote:

State of a register doesn't matter if it wasn't read in reaching an exit;
  a write screens off all reads downstream of it from all explored_states
  upstream of it.
This allows us to prune many more branches; here are some processed insn
  counts for some Cilium programs:
Program  before  after
bpf_lb_opt_-DLB_L3.o   6515   3361
bpf_lb_opt_-DLB_L4.o   8976   5176
bpf_lb_opt_-DUNKNOWN.o 2960   1137
bpf_lxc_opt_-DDROP_ALL.o  95412  48537
bpf_lxc_opt_-DUNKNOWN.o  141706  78718
bpf_netdev.o  24251  17995
bpf_overlay.o 10999   9385

The runtime is also improved; here are 'time' results in ms:
Program  before  after
bpf_lb_opt_-DLB_L3.o 24  6
bpf_lb_opt_-DLB_L4.o 26 11
bpf_lb_opt_-DUNKNOWN.o   11  2
bpf_lxc_opt_-DDROP_ALL.o   1288139
bpf_lxc_opt_-DUNKNOWN.o1768234
bpf_netdev.o 62 31
bpf_overlay.o15 13

Signed-off-by: Edward Cree <ec...@solarflare.com>


Acked-by: Daniel Borkmann <dan...@iogearbox.net>


Re: [PATCH v3 net-next] bpf/verifier: track liveness for pruning

2017-08-15 Thread Daniel Borkmann

On 08/15/2017 09:34 PM, Edward Cree wrote:

State of a register doesn't matter if it wasn't read in reaching an exit;
  a write screens off all reads downstream of it from all explored_states
  upstream of it.
This allows us to prune many more branches; here are some processed insn
  counts for some Cilium programs:
Program  before  after
bpf_lb_opt_-DLB_L3.o   6515   3361
bpf_lb_opt_-DLB_L4.o   8976   5176
bpf_lb_opt_-DUNKNOWN.o 2960   1137
bpf_lxc_opt_-DDROP_ALL.o  95412  48537
bpf_lxc_opt_-DUNKNOWN.o  141706  78718
bpf_netdev.o  24251  17995
bpf_overlay.o 10999   9385

The runtime is also improved; here are 'time' results in ms:
Program  before  after
bpf_lb_opt_-DLB_L3.o 24  6
bpf_lb_opt_-DLB_L4.o 26 11
bpf_lb_opt_-DUNKNOWN.o   11  2
bpf_lxc_opt_-DDROP_ALL.o   1288139
bpf_lxc_opt_-DUNKNOWN.o1768234
bpf_netdev.o 62 31
bpf_overlay.o15 13

Signed-off-by: Edward Cree 


Acked-by: Daniel Borkmann 


Re: [PATCH v2 net-next] bpf/verifier: track liveness for pruning

2017-08-15 Thread Daniel Borkmann

On 08/15/2017 03:53 PM, Edward Cree wrote:

State of a register doesn't matter if it wasn't read in reaching an exit;
  a write screens off all reads downstream of it from all explored_states
  upstream of it.
This allows us to prune many more branches; here are some processed insn
  counts for some Cilium programs:
Program  before  after
bpf_lb_opt_-DLB_L3.o   6515   3361
bpf_lb_opt_-DLB_L4.o   8976   5176
bpf_lb_opt_-DUNKNOWN.o 2960   1137
bpf_lxc_opt_-DDROP_ALL.o  95412  48537
bpf_lxc_opt_-DUNKNOWN.o  141706  78718
bpf_netdev.o  24251  17995
bpf_overlay.o 10999   9385

The runtime is also improved; here are 'time' results in ms:
Program  before  after
bpf_lb_opt_-DLB_L3.o 24  6
bpf_lb_opt_-DLB_L4.o 26 11
bpf_lb_opt_-DUNKNOWN.o   11  2
bpf_lxc_opt_-DDROP_ALL.o   1288139
bpf_lxc_opt_-DUNKNOWN.o1768234
bpf_netdev.o 62 31
bpf_overlay.o15 13

Signed-off-by: Edward Cree 

[...]

@@ -1639,10 +1675,13 @@ static int check_call(struct bpf_verifier_env *env, int 
func_id, int insn_idx)
}

/* reset caller saved regs */
-   for (i = 0; i < CALLER_SAVED_REGS; i++)
+   for (i = 0; i < CALLER_SAVED_REGS; i++) {
mark_reg_not_init(regs, caller_saved[i]);
+   check_reg_arg(env, i, DST_OP_NO_MARK);


Ah, I oversaw that earlier, this needs to be: s/i/caller_saved[i]/


+   }

/* update return register */
+   check_reg_arg(env, BPF_REG_0, DST_OP_NO_MARK);


We could leave this for clarity, but ...


if (fn->ret_type == RET_INTEGER) {
/* sets type to SCALAR_VALUE */
mark_reg_unknown(regs, BPF_REG_0);

[...]


/* reset caller saved regs to unreadable */
-   for (i = 0; i < CALLER_SAVED_REGS; i++)
+   for (i = 0; i < CALLER_SAVED_REGS; i++) {
mark_reg_not_init(regs, caller_saved[i]);
+   check_reg_arg(env, i, DST_OP_NO_MARK);


caller_saved[i]


+   }

/* mark destination R0 register as readable, since it contains
-* the value fetched from the packet
+* the value fetched from the packet.
+* Already marked as written above.


... then it should be here as well. Other option is to leave out
both BPF_REG_0 since covered by caller_saved[] already.


 */
mark_reg_unknown(regs, BPF_REG_0);
return 0;
@@ -3194,7 +3236,11 @@ static bool regsafe(struct bpf_reg_state *rold,
struct bpf_reg_state *rcur,
bool varlen_map_access, struct idpair *idmap)
  {
-   if (memcmp(rold, rcur, sizeof(*rold)) == 0)
+   if (!(rold->live & REG_LIVE_READ))
+   /* explored state didn't use this */
+   return true;


Re: [PATCH v2 net-next] bpf/verifier: track liveness for pruning

2017-08-15 Thread Daniel Borkmann

On 08/15/2017 03:53 PM, Edward Cree wrote:

State of a register doesn't matter if it wasn't read in reaching an exit;
  a write screens off all reads downstream of it from all explored_states
  upstream of it.
This allows us to prune many more branches; here are some processed insn
  counts for some Cilium programs:
Program  before  after
bpf_lb_opt_-DLB_L3.o   6515   3361
bpf_lb_opt_-DLB_L4.o   8976   5176
bpf_lb_opt_-DUNKNOWN.o 2960   1137
bpf_lxc_opt_-DDROP_ALL.o  95412  48537
bpf_lxc_opt_-DUNKNOWN.o  141706  78718
bpf_netdev.o  24251  17995
bpf_overlay.o 10999   9385

The runtime is also improved; here are 'time' results in ms:
Program  before  after
bpf_lb_opt_-DLB_L3.o 24  6
bpf_lb_opt_-DLB_L4.o 26 11
bpf_lb_opt_-DUNKNOWN.o   11  2
bpf_lxc_opt_-DDROP_ALL.o   1288139
bpf_lxc_opt_-DUNKNOWN.o1768234
bpf_netdev.o 62 31
bpf_overlay.o15 13

Signed-off-by: Edward Cree 

[...]

@@ -1639,10 +1675,13 @@ static int check_call(struct bpf_verifier_env *env, int 
func_id, int insn_idx)
}

/* reset caller saved regs */
-   for (i = 0; i < CALLER_SAVED_REGS; i++)
+   for (i = 0; i < CALLER_SAVED_REGS; i++) {
mark_reg_not_init(regs, caller_saved[i]);
+   check_reg_arg(env, i, DST_OP_NO_MARK);


Ah, I oversaw that earlier, this needs to be: s/i/caller_saved[i]/


+   }

/* update return register */
+   check_reg_arg(env, BPF_REG_0, DST_OP_NO_MARK);


We could leave this for clarity, but ...


if (fn->ret_type == RET_INTEGER) {
/* sets type to SCALAR_VALUE */
mark_reg_unknown(regs, BPF_REG_0);

[...]


/* reset caller saved regs to unreadable */
-   for (i = 0; i < CALLER_SAVED_REGS; i++)
+   for (i = 0; i < CALLER_SAVED_REGS; i++) {
mark_reg_not_init(regs, caller_saved[i]);
+   check_reg_arg(env, i, DST_OP_NO_MARK);


caller_saved[i]


+   }

/* mark destination R0 register as readable, since it contains
-* the value fetched from the packet
+* the value fetched from the packet.
+* Already marked as written above.


... then it should be here as well. Other option is to leave out
both BPF_REG_0 since covered by caller_saved[] already.


 */
mark_reg_unknown(regs, BPF_REG_0);
return 0;
@@ -3194,7 +3236,11 @@ static bool regsafe(struct bpf_reg_state *rold,
struct bpf_reg_state *rcur,
bool varlen_map_access, struct idpair *idmap)
  {
-   if (memcmp(rold, rcur, sizeof(*rold)) == 0)
+   if (!(rold->live & REG_LIVE_READ))
+   /* explored state didn't use this */
+   return true;


Re: [PATCH net-next] bpf/verifier: track liveness for pruning

2017-08-15 Thread Daniel Borkmann

On 08/14/2017 07:55 PM, Edward Cree wrote:

State of a register doesn't matter if it wasn't read in reaching an exit;
  a write screens off all reads downstream of it from all explored_states
  upstream of it.
This allows us to prune many more branches; here are some processed insn
  counts for some Cilium programs:
Program  before  after
bpf_lb_opt_-DLB_L3.o   6515   3361
bpf_lb_opt_-DLB_L4.o   8976   5176
bpf_lb_opt_-DUNKNOWN.o 2960   1137
bpf_lxc_opt_-DDROP_ALL.o  95412  48537
bpf_lxc_opt_-DUNKNOWN.o  141706  79048
bpf_netdev.o  24251  17995
bpf_overlay.o 10999   9385

The runtime is also improved; here are 'time' results in ms:
Program  before  after
bpf_lb_opt_-DLB_L3.o 24  6
bpf_lb_opt_-DLB_L4.o 26 11
bpf_lb_opt_-DUNKNOWN.o   11  2
bpf_lxc_opt_-DDROP_ALL.o   1288152
bpf_lxc_opt_-DUNKNOWN.o1768257
bpf_netdev.o 62 31
bpf_overlay.o15 13

Signed-off-by: Edward Cree 


Awesome work!

[...]

if (arg_type == ARG_ANYTHING) {
if (is_pointer_value(env, regno)) {
@@ -1639,10 +1675,13 @@ static int check_call(struct bpf_verifier_env *env, int 
func_id, int insn_idx)
}

/* reset caller saved regs */
-   for (i = 0; i < CALLER_SAVED_REGS; i++)
+   for (i = 0; i < CALLER_SAVED_REGS; i++) {
mark_reg_not_init(regs, caller_saved[i]);
+   check_reg_arg(env, i, DST_OP_NO_MARK);


Don't we need the same in check_ld_abs() since we treat it similar
to a function call?


+   }

/* update return register */
+   check_reg_arg(env, BPF_REG_0, DST_OP_NO_MARK);

[...]


Re: [PATCH net-next] bpf/verifier: track liveness for pruning

2017-08-15 Thread Daniel Borkmann

On 08/14/2017 07:55 PM, Edward Cree wrote:

State of a register doesn't matter if it wasn't read in reaching an exit;
  a write screens off all reads downstream of it from all explored_states
  upstream of it.
This allows us to prune many more branches; here are some processed insn
  counts for some Cilium programs:
Program  before  after
bpf_lb_opt_-DLB_L3.o   6515   3361
bpf_lb_opt_-DLB_L4.o   8976   5176
bpf_lb_opt_-DUNKNOWN.o 2960   1137
bpf_lxc_opt_-DDROP_ALL.o  95412  48537
bpf_lxc_opt_-DUNKNOWN.o  141706  79048
bpf_netdev.o  24251  17995
bpf_overlay.o 10999   9385

The runtime is also improved; here are 'time' results in ms:
Program  before  after
bpf_lb_opt_-DLB_L3.o 24  6
bpf_lb_opt_-DLB_L4.o 26 11
bpf_lb_opt_-DUNKNOWN.o   11  2
bpf_lxc_opt_-DDROP_ALL.o   1288152
bpf_lxc_opt_-DUNKNOWN.o1768257
bpf_netdev.o 62 31
bpf_overlay.o15 13

Signed-off-by: Edward Cree 


Awesome work!

[...]

if (arg_type == ARG_ANYTHING) {
if (is_pointer_value(env, regno)) {
@@ -1639,10 +1675,13 @@ static int check_call(struct bpf_verifier_env *env, int 
func_id, int insn_idx)
}

/* reset caller saved regs */
-   for (i = 0; i < CALLER_SAVED_REGS; i++)
+   for (i = 0; i < CALLER_SAVED_REGS; i++) {
mark_reg_not_init(regs, caller_saved[i]);
+   check_reg_arg(env, i, DST_OP_NO_MARK);


Don't we need the same in check_ld_abs() since we treat it similar
to a function call?


+   }

/* update return register */
+   check_reg_arg(env, BPF_REG_0, DST_OP_NO_MARK);

[...]


Re: [PATCH net-next V2 3/3] tap: XDP support

2017-08-14 Thread Daniel Borkmann

On 08/11/2017 01:41 PM, Jason Wang wrote:

This patch tries to implement XDP for tun. The implementation was
split into two parts:

[...]

@@ -1402,6 +1521,22 @@ static ssize_t tun_get_user(struct tun_struct *tun, 
struct tun_file *tfile,
skb_reset_network_header(skb);
skb_probe_transport_header(skb, 0);

+   if (generic_xdp) {
+   struct bpf_prog *xdp_prog;
+   int ret;
+
+   rcu_read_lock();
+   xdp_prog = rcu_dereference(tun->xdp_prog);


The name generic_xdp is a bit confusing in this context given this
is 'native' XDP, perhaps above if (generic_xdp) should have a comment
explaining semantics for tun and how it relates to actual generic xdp
that sits at dev->xdp_prog, and gets run from netif_rx_ni(). Or just
name the bool xdp_handle_gso with a comment that we let the generic
XDP infrastructure deal with non-linear skbs instead of having to
re-implement the do_xdp_generic() internals, plus a statement that
the actual generic XDP comes a bit later in the path. That would at
least make it more obvious to read, imho.


+   if (xdp_prog) {
+   ret = do_xdp_generic(xdp_prog, skb);
+   if (ret != XDP_PASS) {
+   rcu_read_unlock();
+   return total_len;
+   }
+   }
+   rcu_read_unlock();
+   }
+
rxhash = __skb_get_hash_symmetric(skb);
  #ifndef CONFIG_4KSTACKS
tun_rx_batched(tun, tfile, skb, more);





Re: [PATCH net-next V2 3/3] tap: XDP support

2017-08-14 Thread Daniel Borkmann

On 08/11/2017 01:41 PM, Jason Wang wrote:

This patch tries to implement XDP for tun. The implementation was
split into two parts:

[...]

@@ -1402,6 +1521,22 @@ static ssize_t tun_get_user(struct tun_struct *tun, 
struct tun_file *tfile,
skb_reset_network_header(skb);
skb_probe_transport_header(skb, 0);

+   if (generic_xdp) {
+   struct bpf_prog *xdp_prog;
+   int ret;
+
+   rcu_read_lock();
+   xdp_prog = rcu_dereference(tun->xdp_prog);


The name generic_xdp is a bit confusing in this context given this
is 'native' XDP, perhaps above if (generic_xdp) should have a comment
explaining semantics for tun and how it relates to actual generic xdp
that sits at dev->xdp_prog, and gets run from netif_rx_ni(). Or just
name the bool xdp_handle_gso with a comment that we let the generic
XDP infrastructure deal with non-linear skbs instead of having to
re-implement the do_xdp_generic() internals, plus a statement that
the actual generic XDP comes a bit later in the path. That would at
least make it more obvious to read, imho.


+   if (xdp_prog) {
+   ret = do_xdp_generic(xdp_prog, skb);
+   if (ret != XDP_PASS) {
+   rcu_read_unlock();
+   return total_len;
+   }
+   }
+   rcu_read_unlock();
+   }
+
rxhash = __skb_get_hash_symmetric(skb);
  #ifndef CONFIG_4KSTACKS
tun_rx_batched(tun, tfile, skb, more);





Re: [RFC PATCH 2/2] bpf: Initialise mod[] in bpf_trace_printk

2017-08-11 Thread Daniel Borkmann

Hi James,

On 08/09/2017 10:34 PM, Daniel Borkmann wrote:

On 08/09/2017 09:39 AM, James Hogan wrote:
[...]

time (but please consider looking at the other patch which is certainly
a more real issue).


Sorry for the delay, started looking into that and whether we
have some other options, I'll get back to you on this.


Could we solve this more generically (as in: originally intended) in
the sense that we don't need to trust the gcc va_list handling; I feel
this is relying on an implementation detail? Perhaps something like
below poc patch?

Thanks again,
Daniel

From 71f16544d455abb6bb82f7253c17c14d2a395e91 Mon Sep 17 00:00:00 2001
Message-Id: 
<71f16544d455abb6bb82f7253c17c14d2a395e91.1502469361.git.dan...@iogearbox.net>
From: Daniel Borkmann <dan...@iogearbox.net>
Date: Fri, 11 Aug 2017 15:56:32 +0200
Subject: [PATCH] bpf: fix bpf_trace_printk on 32 bit

Signed-off-by: Daniel Borkmann <dan...@iogearbox.net>
---
 kernel/trace/bpf_trace.c | 31 +++
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 3738519..d4cb36f 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -204,10 +204,33 @@ static const struct bpf_func_proto 
*bpf_get_probe_write_proto(void)
fmt_cnt++;
}

-   return __trace_printk(1/* fake ip will not be printed */, fmt,
- mod[0] == 2 ? arg1 : mod[0] == 1 ? (long) arg1 : 
(u32) arg1,
- mod[1] == 2 ? arg2 : mod[1] == 1 ? (long) arg2 : 
(u32) arg2,
- mod[2] == 2 ? arg3 : mod[2] == 1 ? (long) arg3 : 
(u32) arg3);
+#define __BPF_TP_EMIT()__BPF_ARG3_TP()
+#define __BPF_TP(...)  \
+   __trace_printk(1 /* fake ip will not be printed */, \
+  fmt, ##__VA_ARGS__)
+
+#define __BPF_ARG1_TP(...) \
+   ((mod[0] == 2 || (mod[0] == 1 && __BITS_PER_LONG == 64))\
+ ? __BPF_TP(arg1, ##__VA_ARGS__)   \
+ : ((mod[0] == 1 || (mod[0] == 0 && __BITS_PER_LONG == 32))\
+ ? __BPF_TP((long)arg1, ##__VA_ARGS__) \
+ : __BPF_TP((u32)arg1, ##__VA_ARGS__)))
+
+#define __BPF_ARG2_TP(...) \
+   ((mod[1] == 2 || (mod[1] == 1 && __BITS_PER_LONG == 64))\
+ ? __BPF_ARG1_TP(arg2, ##__VA_ARGS__)  \
+ : ((mod[1] == 1 || (mod[1] == 0 && __BITS_PER_LONG == 32))\
+ ? __BPF_ARG1_TP((long)arg2, ##__VA_ARGS__)\
+ : __BPF_ARG1_TP((u32)arg2, ##__VA_ARGS__)))
+
+#define __BPF_ARG3_TP(...) \
+   ((mod[2] == 2 || (mod[2] == 1 && __BITS_PER_LONG == 64))\
+ ? __BPF_ARG2_TP(arg3, ##__VA_ARGS__)  \
+ : ((mod[2] == 1 || (mod[2] == 0 && __BITS_PER_LONG == 32))\
+ ? __BPF_ARG2_TP((long)arg3, ##__VA_ARGS__)\
+ : __BPF_ARG2_TP((u32)arg3, ##__VA_ARGS__)))
+
+   return __BPF_TP_EMIT();
 }

 static const struct bpf_func_proto bpf_trace_printk_proto = {
--
1.9.3


Re: [RFC PATCH 2/2] bpf: Initialise mod[] in bpf_trace_printk

2017-08-11 Thread Daniel Borkmann

Hi James,

On 08/09/2017 10:34 PM, Daniel Borkmann wrote:

On 08/09/2017 09:39 AM, James Hogan wrote:
[...]

time (but please consider looking at the other patch which is certainly
a more real issue).


Sorry for the delay, started looking into that and whether we
have some other options, I'll get back to you on this.


Could we solve this more generically (as in: originally intended) in
the sense that we don't need to trust the gcc va_list handling; I feel
this is relying on an implementation detail? Perhaps something like
below poc patch?

Thanks again,
Daniel

From 71f16544d455abb6bb82f7253c17c14d2a395e91 Mon Sep 17 00:00:00 2001
Message-Id: 
<71f16544d455abb6bb82f7253c17c14d2a395e91.1502469361.git.dan...@iogearbox.net>
From: Daniel Borkmann 
Date: Fri, 11 Aug 2017 15:56:32 +0200
Subject: [PATCH] bpf: fix bpf_trace_printk on 32 bit

Signed-off-by: Daniel Borkmann 
---
 kernel/trace/bpf_trace.c | 31 +++
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 3738519..d4cb36f 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -204,10 +204,33 @@ static const struct bpf_func_proto 
*bpf_get_probe_write_proto(void)
fmt_cnt++;
}

-   return __trace_printk(1/* fake ip will not be printed */, fmt,
- mod[0] == 2 ? arg1 : mod[0] == 1 ? (long) arg1 : 
(u32) arg1,
- mod[1] == 2 ? arg2 : mod[1] == 1 ? (long) arg2 : 
(u32) arg2,
- mod[2] == 2 ? arg3 : mod[2] == 1 ? (long) arg3 : 
(u32) arg3);
+#define __BPF_TP_EMIT()__BPF_ARG3_TP()
+#define __BPF_TP(...)  \
+   __trace_printk(1 /* fake ip will not be printed */, \
+  fmt, ##__VA_ARGS__)
+
+#define __BPF_ARG1_TP(...) \
+   ((mod[0] == 2 || (mod[0] == 1 && __BITS_PER_LONG == 64))\
+ ? __BPF_TP(arg1, ##__VA_ARGS__)   \
+ : ((mod[0] == 1 || (mod[0] == 0 && __BITS_PER_LONG == 32))\
+ ? __BPF_TP((long)arg1, ##__VA_ARGS__) \
+ : __BPF_TP((u32)arg1, ##__VA_ARGS__)))
+
+#define __BPF_ARG2_TP(...) \
+   ((mod[1] == 2 || (mod[1] == 1 && __BITS_PER_LONG == 64))\
+ ? __BPF_ARG1_TP(arg2, ##__VA_ARGS__)  \
+ : ((mod[1] == 1 || (mod[1] == 0 && __BITS_PER_LONG == 32))\
+ ? __BPF_ARG1_TP((long)arg2, ##__VA_ARGS__)\
+ : __BPF_ARG1_TP((u32)arg2, ##__VA_ARGS__)))
+
+#define __BPF_ARG3_TP(...) \
+   ((mod[2] == 2 || (mod[2] == 1 && __BITS_PER_LONG == 64))\
+ ? __BPF_ARG2_TP(arg3, ##__VA_ARGS__)  \
+ : ((mod[2] == 1 || (mod[2] == 0 && __BITS_PER_LONG == 32))\
+ ? __BPF_ARG2_TP((long)arg3, ##__VA_ARGS__)\
+ : __BPF_ARG2_TP((u32)arg3, ##__VA_ARGS__)))
+
+   return __BPF_TP_EMIT();
 }

 static const struct bpf_func_proto bpf_trace_printk_proto = {
--
1.9.3


Re: [RFC PATCH 2/2] bpf: Initialise mod[] in bpf_trace_printk

2017-08-09 Thread Daniel Borkmann

On 08/09/2017 09:39 AM, James Hogan wrote:
[...]

time (but please consider looking at the other patch which is certainly
a more real issue).


Sorry for the delay, started looking into that and whether we
have some other options, I'll get back to you on this.

Thanks,
Daniel


<    3   4   5   6   7   8   9   10   11   12   >