Re: [PATCH v4 net] bpf: add bpf_sk_netns_id() helper

2017-02-15 Thread Eric W. Biederman
David Ahern  writes:

> On 2/15/17 8:25 PM, Andy Lutomirski wrote:
>> On Wed, Feb 15, 2017 at 7:18 PM, David Ahern  
>> wrote:
>>> On 2/15/17 8:08 PM, Eric W. Biederman wrote:
 David Ahern  writes:

> On 2/14/17 12:21 AM, Eric W. Biederman wrote:
>>> in cases where bpf programs are looking at sockets and packets
>>> that belong to different netns, it could be useful to get an id
>>> that uniquely identify a netns within the whole system.
>> It could be useful but there is no unique namespace id.
>>
>
> Have you given thought to a unique namespace id? Networking tracepoints
> for example could really benefit from a unique id.

 An id from the perspective of a process in the initial instance of every
 namespace is certainly possible.

 A truly unique id is just not maintainable.  Think of the question how
 do you assign every device in the world a rguaranteed unique ip address
 without coordination, that is routable.  It is essentially the same
 problem.

 AKA it is theoretically possible and very expensive.  It is much easier
 and much more maintainable for identifiers to have scope and only be
 unique within that scope.
>>>
>>>
>>> I don't mean unique in the entire world, I mean unique within a single
>>> system.
>>>
>>> Tracepoints are code based and have global scope. I would like to be
>>> able to correlate, for example, FIB lookups within a single network
>>> namespace. Having an id that I could filter on when collecting or match
>>> when dumping them goes a long way.
>> 
>> Why wouldn't an id relative to your logging program work?  Global ids
>> are problematic because they are incompatible with tools like CRIU.
>> 
>
> How would that work?
>
> To be specific with an example, I only want FIB lookups for network
> namespace "foo". The name "foo" only has meaning for iproute2, so I need
> something the kernel understands. Should that be a dev/inode match
> meaning the tracepoints contain the netns dev and inode?
>
> From a perf perspective, the command line is like this:
>perf record -e fib:fib_table_lookup --filter="netns_dev == 3 &&
> netns_ino == 4026531957" -a -g -- sleep 5
>
> Cumbersome, but it would work if the tracepoints had netns_dev and
> netns_ino as variables. A single id would be better.

A netns_dev_ino variable perhaps?  Something that you could pass a netns
file descriptor to perf and perf would just sort out the rest?

I believe those are just tooling issues.

The practical issue with one id that is global everywhere is that it has
to work for checkpoint/restart.   At which point it truly has to be
globably unique or namespaced.

Eric



Re: [PATCH v4 net] bpf: add bpf_sk_netns_id() helper

2017-02-15 Thread David Ahern
On 2/15/17 8:25 PM, Andy Lutomirski wrote:
> On Wed, Feb 15, 2017 at 7:18 PM, David Ahern  wrote:
>> On 2/15/17 8:08 PM, Eric W. Biederman wrote:
>>> David Ahern  writes:
>>>
 On 2/14/17 12:21 AM, Eric W. Biederman wrote:
>> in cases where bpf programs are looking at sockets and packets
>> that belong to different netns, it could be useful to get an id
>> that uniquely identify a netns within the whole system.
> It could be useful but there is no unique namespace id.
>

 Have you given thought to a unique namespace id? Networking tracepoints
 for example could really benefit from a unique id.
>>>
>>> An id from the perspective of a process in the initial instance of every
>>> namespace is certainly possible.
>>>
>>> A truly unique id is just not maintainable.  Think of the question how
>>> do you assign every device in the world a rguaranteed unique ip address
>>> without coordination, that is routable.  It is essentially the same
>>> problem.
>>>
>>> AKA it is theoretically possible and very expensive.  It is much easier
>>> and much more maintainable for identifiers to have scope and only be
>>> unique within that scope.
>>
>>
>> I don't mean unique in the entire world, I mean unique within a single
>> system.
>>
>> Tracepoints are code based and have global scope. I would like to be
>> able to correlate, for example, FIB lookups within a single network
>> namespace. Having an id that I could filter on when collecting or match
>> when dumping them goes a long way.
> 
> Why wouldn't an id relative to your logging program work?  Global ids
> are problematic because they are incompatible with tools like CRIU.
> 

How would that work?

To be specific with an example, I only want FIB lookups for network
namespace "foo". The name "foo" only has meaning for iproute2, so I need
something the kernel understands. Should that be a dev/inode match
meaning the tracepoints contain the netns dev and inode?

>From a perf perspective, the command line is like this:
   perf record -e fib:fib_table_lookup --filter="netns_dev == 3 &&
netns_ino == 4026531957" -a -g -- sleep 5

Cumbersome, but it would work if the tracepoints had netns_dev and
netns_ino as variables. A single id would be better.


Re: [PATCH v4 net] bpf: add bpf_sk_netns_id() helper

2017-02-15 Thread Andy Lutomirski
On Wed, Feb 15, 2017 at 7:18 PM, David Ahern  wrote:
> On 2/15/17 8:08 PM, Eric W. Biederman wrote:
>> David Ahern  writes:
>>
>>> On 2/14/17 12:21 AM, Eric W. Biederman wrote:
> in cases where bpf programs are looking at sockets and packets
> that belong to different netns, it could be useful to get an id
> that uniquely identify a netns within the whole system.
 It could be useful but there is no unique namespace id.

>>>
>>> Have you given thought to a unique namespace id? Networking tracepoints
>>> for example could really benefit from a unique id.
>>
>> An id from the perspective of a process in the initial instance of every
>> namespace is certainly possible.
>>
>> A truly unique id is just not maintainable.  Think of the question how
>> do you assign every device in the world a rguaranteed unique ip address
>> without coordination, that is routable.  It is essentially the same
>> problem.
>>
>> AKA it is theoretically possible and very expensive.  It is much easier
>> and much more maintainable for identifiers to have scope and only be
>> unique within that scope.
>
>
> I don't mean unique in the entire world, I mean unique within a single
> system.
>
> Tracepoints are code based and have global scope. I would like to be
> able to correlate, for example, FIB lookups within a single network
> namespace. Having an id that I could filter on when collecting or match
> when dumping them goes a long way.

Why wouldn't an id relative to your logging program work?  Global ids
are problematic because they are incompatible with tools like CRIU.

-- 
Andy Lutomirski
AMA Capital Management, LLC


Re: [PATCH v4 net] bpf: add bpf_sk_netns_id() helper

2017-02-15 Thread David Ahern
On 2/15/17 8:08 PM, Eric W. Biederman wrote:
> David Ahern  writes:
> 
>> On 2/14/17 12:21 AM, Eric W. Biederman wrote:
 in cases where bpf programs are looking at sockets and packets
 that belong to different netns, it could be useful to get an id
 that uniquely identify a netns within the whole system.
>>> It could be useful but there is no unique namespace id.
>>>
>>
>> Have you given thought to a unique namespace id? Networking tracepoints
>> for example could really benefit from a unique id.
> 
> An id from the perspective of a process in the initial instance of every
> namespace is certainly possible.
> 
> A truly unique id is just not maintainable.  Think of the question how
> do you assign every device in the world a rguaranteed unique ip address
> without coordination, that is routable.  It is essentially the same
> problem.
> 
> AKA it is theoretically possible and very expensive.  It is much easier
> and much more maintainable for identifiers to have scope and only be
> unique within that scope.


I don't mean unique in the entire world, I mean unique within a single
system.

Tracepoints are code based and have global scope. I would like to be
able to correlate, for example, FIB lookups within a single network
namespace. Having an id that I could filter on when collecting or match
when dumping them goes a long way.


Re: [PATCH v4 net] bpf: add bpf_sk_netns_id() helper

2017-02-15 Thread Eric W. Biederman
David Ahern  writes:

> On 2/14/17 12:21 AM, Eric W. Biederman wrote:
>>> in cases where bpf programs are looking at sockets and packets
>>> that belong to different netns, it could be useful to get an id
>>> that uniquely identify a netns within the whole system.
>> It could be useful but there is no unique namespace id.
>> 
>
> Have you given thought to a unique namespace id? Networking tracepoints
> for example could really benefit from a unique id.

An id from the perspective of a process in the initial instance of every
namespace is certainly possible.

A truly unique id is just not maintainable.  Think of the question how
do you assign every device in the world a rguaranteed unique ip address
without coordination, that is routable.  It is essentially the same
problem.

AKA it is theoretically possible and very expensive.  It is much easier
and much more maintainable for identifiers to have scope and only be
unique within that scope.

Eric


Re: [PATCH v4 net] bpf: add bpf_sk_netns_id() helper

2017-02-15 Thread David Ahern
On 2/14/17 12:21 AM, Eric W. Biederman wrote:
>> in cases where bpf programs are looking at sockets and packets
>> that belong to different netns, it could be useful to get an id
>> that uniquely identify a netns within the whole system.
> It could be useful but there is no unique namespace id.
> 

Have you given thought to a unique namespace id? Networking tracepoints
for example could really benefit from a unique id.


Re: [PATCH v4 net] bpf: add bpf_sk_netns_id() helper

2017-02-13 Thread Eric W. Biederman
Alexei Starovoitov  writes:

> in cases where bpf programs are looking at sockets and packets
> that belong to different netns, it could be useful to get an id
> that uniquely identify a netns within the whole system.

It could be useful but there is no unique namespace id.

> Therefore introduce 'u64 bpf_sk_netns_id(sk);' helper. It returns
> unique value that identifies netns of given socket or dev_net(skb->dev)
> The upper 32-bits of the return value contain device id where namespace
> filesystem resides and lower 32-bits contain inode number within that 
> filesystem.
> It's the same as
>  struct stat st;
>  stat("/proc/pid/ns/net", &st);
>  return (st->st_dev << 32)  | st->st_ino;

The function is fundamentally buggy.  Inode numbers are 64bit and need
to be 64bit whenever we expose them to userspace.  Otherwise we are
painting ourselves into a corner with respect to future expansion.

> For example to disallow raw sockets in all non-init netns
> the bpf_type_cgroup_sock program can do:
> if (sk->type == SOCK_RAW && bpf_sk_netns_id(sk) != 0x3f075)
>   return 0;
> where 0x3f075 comes from combination of st_dev and st_ino
> of /proc/pid/ns/net

Which is generally a reasonable type of thing to do.

However if we make the logic look like:

if (sk->type == SOCK_RAW && bpf_sk_net(sk, 0x3f, 0x75))
return 0;

With the comparison in the function call itself.  That will solve the
32 vs 64bit inode number issue as well putting the burden on matching
what userspace sees to what the kernel sees to the kernel.  Which
is much more future proof.

I suspect the bpf verifier can even be enhanced to check that the last
two arguments are constants.  Limiting the device number and inode
number to constants will make further optimizations/simplifcations
possible.   But that is just a nice to have.

But the key thing here is that if we pass the device number and the
inode to the kernel and ask it to compare, the kernel can lookup up the
namespace by device+inode and see if it matches what is on the socket
without any need for that to be a unique name of the network namespace
which is 1000 times more maintainable then returning a magic string.

Which means even if all we do in kernel churn is go back to the
implementation that existed a little while ago where the device number
depended upon which mount of proc you looked at, the bpf filters written
today can all be made to work with any challenge.

Does that make sense?

Eric


Re: [PATCH v4 net] bpf: add bpf_sk_netns_id() helper

2017-02-08 Thread David Miller
From: Alexei Starovoitov 
Date: Mon, 6 Feb 2017 18:02:48 -0800

> Eric, I'v added proc_get_ns_devid_inum() to nsfs.c
> right next to __ns_get_path(), so when it is time in the future
> to make nsfs more namespace aware, it will be easy to adjust
> both new_inode_pseudo(mnt->mnt_sb) line and proc_get_ns_devid_inum()
> I thought about using ns->stashed, but it's obviously transient
> inode and not usable. If later we decide to store dev_t into ns_common
> it will be fine as well. We'll just change proc_get_ns_devid_inum()
> without affecting user space.

Eric?


Re: [PATCH v4 net] bpf: add bpf_sk_netns_id() helper

2017-02-07 Thread Daniel Borkmann

On 02/07/2017 03:02 AM, Alexei Starovoitov wrote:

in cases where bpf programs are looking at sockets and packets
that belong to different netns, it could be useful to get an id
that uniquely identify a netns within the whole system.

Therefore introduce 'u64 bpf_sk_netns_id(sk);' helper. It returns
unique value that identifies netns of given socket or dev_net(skb->dev)
The upper 32-bits of the return value contain device id where namespace
filesystem resides and lower 32-bits contain inode number within that 
filesystem.

[...]

Signed-off-by: Alexei Starovoitov 
Reviewed-by: David Ahern 
Tested-by: David Ahern 


For the BPF bits:

Acked-by: Daniel Borkmann