Re: XDP question: best API for returning/setting egress port?

2017-04-20 Thread Andy Gospodarek
On Wed, Apr 19, 2017 at 11:42:30PM +0200, Daniel Borkmann wrote:
> On 04/19/2017 10:02 PM, Andy Gospodarek wrote:
> > On Tue, Apr 18, 2017 at 09:58:56PM +0200, Jesper Dangaard Brouer wrote:
> > > 
> > > As I argued in NetConf presentation[1] (from slide #9) we need a port
> > > mapping table (instead of using ifindex'es).  Both for supporting
> > > other "port" types than net_devices (think sockets), and for
> > > sandboxing what XDP can bypass.
> > > 
> > > I want to create a new XDP action called XDP_REDIRECT, that instruct
> > > XDP to send the xdp_buff to another "port" (get translated into a
> > > net_device, or something else depending on internal port type).
> > > 
> > > Looking at the userspace/eBPF interface, I'm wondering what is the
> > > best API for "returning" this port number from eBPF?
> > > 
> > > The options I see is:
> > > 
> > > 1) Split-up the u32 action code, and e.g let the high-16-bit be the
> > > port number and lower-16bit the (existing) action verdict.
> > > 
> > >   Pros: Simple API
> > >   Cons: Number of ports limited to 64K
> > 
> > Practically speaking this may be seem reserving 64k for the port number
> > might be enough space, but I would also like to see a new return option
> > for flags so I start to get concerned about space.  Daniel also
> > hightlights the fact that encoding the port in the action may does not
> > leave room for flags and could get confusing.
> > 
> > One unfortunate side-effect of dropping or transmitting frames with XDP
> > is that we lose the opportunity to statistically sample in netfilter
> > since the frames were dropped so early and I'd like to bring that back
> > with a call to parse flags and possibly call psample_sample_packet()
> > after the xdp action.  Packet sampling cannot simply be an action
> > since there are times when a frame should be dropped but should also be
> > sampled, so it seems logical to add this as a flag.
> 
> Hmm, you can do that already today with bpf_xdp_event_output() helper
> and the program decides when to sample and what custom meta data to
> prepend along with the (partial or full) xdp packet, see also [1], slide
> 11 for the "XDP dump" part.
> 
> No need to change drivers for this, psample_sample_packet() would also
> be slower.
> 
>   [1] http://netdevconf.org/2.1/slides/apr6/zhou-netdev-xdp-2017.pdf

Thanks, Daniel.  Not sure how I missed this, but I appreciate the pointer.
I'll add coding up a program to demo this to my TODO list.



Re: XDP question: best API for returning/setting egress port?

2017-04-19 Thread Jesper Dangaard Brouer
On Thu, 20 Apr 2017 00:51:31 +0200
Daniel Borkmann  wrote:

> On 04/19/2017 10:02 PM, Andy Gospodarek wrote:
> [...]
> > and then lookup this dest in a table we have the option to make that
> > dest an ifindex/socket/other.
> >
> > I did also look at JohnF's patch and I do like the simplicity of the 
> > redirect
> > action and new ndo_xdp_xmit and how it moves towards a way to transmit the
> > frame.  The downside is that it presumes an ifindex, so it might not be 
> > ideal
> > we want the lookup to return something other than an ifindex.
> >  
> [...]
> > would be handled.  If we are ultimately going to need a new netdev op to
> > handle the redirect then what may be the issue with not providing the
> > destination port the return code and the option proposed by JohnF looks
> > good to me with maybe a small tweak to not presume ifindex in some manner.  
> 
> Is there a concrete reason that all the proposed future cases like sockets
> have to be handled within the very same XDP_REDIRECT return code? F.e. why
> not XDP_TX_NIC that only assumes ifindex as proposed in the patch, and future
> ones would get a different return code f.e. XDP_TX_SK only handling sockets
> when we get there implementation-wise?

The concrete reason for an "unified" return code is to simplify changes
needed in the drivers.  The purpose is moving code out of the drivers
into the core.  IMHO we should keep drivers as simple as possible and
stop adding more return action codes. XDP_REDIRECT combined with a
port-table should be the last action-code needed in the drivers.
As the port table will know the "type" of the port.

Keeping action XDP_DROP and XDP_TX inside the drivers make sense,
because it allow for driver level optimizations. Actions that want to
transfer the packet/xdp_buff "outside" the driver should call a core
function (given it will be called with the xdp_buff struct, you can
hide all your extra extensions there without changing the driver code).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: XDP question: best API for returning/setting egress port?

2017-04-19 Thread John Fastabend
[...]

> JohnF, any test results with this you can share?  Presumably you tested with
> virtio-net, right?
> 

For my patches using the xdp_redirect with virtio-net showed no measurable
difference from running just the straight XDP_TX in the PPS column. However
virtio_net XDP_TX was quite low to start with (and I was mostly just looking
at functionality so I don't recall if there was a CPU usage hit) so I expect on
ixgbe to see a difference. However I have plenty of cpu overhead to consume on
the 10Gbps card so it will likely just be a CPU % cost not a PPS cost. I'll
bother Alex about the 40Gbps cards ;)

Anyways, I'll put it on my list to rerun. But like the xdp generic pieces it
wont be for a couple days until I get around to it. Sorry need to take care of
a few things first.

Thanks,
John


Re: XDP question: best API for returning/setting egress port?

2017-04-19 Thread Daniel Borkmann

On 04/19/2017 10:02 PM, Andy Gospodarek wrote:
[...]

and then lookup this dest in a table we have the option to make that
dest an ifindex/socket/other.

I did also look at JohnF's patch and I do like the simplicity of the redirect
action and new ndo_xdp_xmit and how it moves towards a way to transmit the
frame.  The downside is that it presumes an ifindex, so it might not be ideal
we want the lookup to return something other than an ifindex.


[...]

would be handled.  If we are ultimately going to need a new netdev op to
handle the redirect then what may be the issue with not providing the
destination port the return code and the option proposed by JohnF looks
good to me with maybe a small tweak to not presume ifindex in some manner.


Is there a concrete reason that all the proposed future cases like sockets
have to be handled within the very same XDP_REDIRECT return code? F.e. why
not XDP_TX_NIC that only assumes ifindex as proposed in the patch, and future
ones would get a different return code f.e. XDP_TX_SK only handling sockets
when we get there implementation-wise?


Re: XDP question: best API for returning/setting egress port?

2017-04-19 Thread Daniel Borkmann

On 04/19/2017 10:02 PM, Andy Gospodarek wrote:

On Tue, Apr 18, 2017 at 09:58:56PM +0200, Jesper Dangaard Brouer wrote:


As I argued in NetConf presentation[1] (from slide #9) we need a port
mapping table (instead of using ifindex'es).  Both for supporting
other "port" types than net_devices (think sockets), and for
sandboxing what XDP can bypass.

I want to create a new XDP action called XDP_REDIRECT, that instruct
XDP to send the xdp_buff to another "port" (get translated into a
net_device, or something else depending on internal port type).

Looking at the userspace/eBPF interface, I'm wondering what is the
best API for "returning" this port number from eBPF?

The options I see is:

1) Split-up the u32 action code, and e.g let the high-16-bit be the
port number and lower-16bit the (existing) action verdict.

  Pros: Simple API
  Cons: Number of ports limited to 64K


Practically speaking this may be seem reserving 64k for the port number
might be enough space, but I would also like to see a new return option
for flags so I start to get concerned about space.  Daniel also
hightlights the fact that encoding the port in the action may does not
leave room for flags and could get confusing.

One unfortunate side-effect of dropping or transmitting frames with XDP
is that we lose the opportunity to statistically sample in netfilter
since the frames were dropped so early and I'd like to bring that back
with a call to parse flags and possibly call psample_sample_packet()
after the xdp action.  Packet sampling cannot simply be an action
since there are times when a frame should be dropped but should also be
sampled, so it seems logical to add this as a flag.


Hmm, you can do that already today with bpf_xdp_event_output() helper
and the program decides when to sample and what custom meta data to
prepend along with the (partial or full) xdp packet, see also [1], slide
11 for the "XDP dump" part.

No need to change drivers for this, psample_sample_packet() would also
be slower.

  [1] http://netdevconf.org/2.1/slides/apr6/zhou-netdev-xdp-2017.pdf


Re: XDP question: best API for returning/setting egress port?

2017-04-19 Thread Andy Gospodarek
On Tue, Apr 18, 2017 at 09:58:56PM +0200, Jesper Dangaard Brouer wrote:
> 
> As I argued in NetConf presentation[1] (from slide #9) we need a port
> mapping table (instead of using ifindex'es).  Both for supporting
> other "port" types than net_devices (think sockets), and for
> sandboxing what XDP can bypass.
> 
> I want to create a new XDP action called XDP_REDIRECT, that instruct
> XDP to send the xdp_buff to another "port" (get translated into a
> net_device, or something else depending on internal port type).
> 
> Looking at the userspace/eBPF interface, I'm wondering what is the
> best API for "returning" this port number from eBPF?
> 
> The options I see is:
> 
> 1) Split-up the u32 action code, and e.g let the high-16-bit be the
>port number and lower-16bit the (existing) action verdict.
> 
>  Pros: Simple API
>  Cons: Number of ports limited to 64K

Practically speaking this may be seem reserving 64k for the port number
might be enough space, but I would also like to see a new return option
for flags so I start to get concerned about space.  Daniel also
hightlights the fact that encoding the port in the action may does not
leave room for flags and could get confusing.

One unfortunate side-effect of dropping or transmitting frames with XDP
is that we lose the opportunity to statistically sample in netfilter
since the frames were dropped so early and I'd like to bring that back
with a call to parse flags and possibly call psample_sample_packet()
after the xdp action.  Packet sampling cannot simply be an action
since there are times when a frame should be dropped but should also be
sampled, so it seems logical to add this as a flag.

> 
> 2) Extend both xdp_buff + xdp_md to contain a (u32) port number, allow
>eBPF to update xdp_md->port.
> 
>  Pros: Larger number of ports.
>  Cons: This require some ebpf translation steps between xdp_buff <-> xdp_md.
>(see xdp_convert_ctx_access)

I think I would lean towards this based on the fact that I'd like to see
a flags field added to the u32 return (maybe the top 8 bits) as
mentioned above.

So if we follow down this path and add a 'dest' field to xdp_buff and
xdp_md like this:

struct xdp_buff {
void *data;
void *data_end;
void *data_hard_start;
__u32 dest; 
};

struct xdp_md {
__u32 data;
__u32 data_end;
__u32 dest;
};

and then lookup this dest in a table we have the option to make that
dest an ifindex/socket/other.

I did also look at JohnF's patch and I do like the simplicity of the redirect
action and new ndo_xdp_xmit and how it moves towards a way to transmit the
frame.  The downside is that it presumes an ifindex, so it might not be ideal
we want the lookup to return something other than an ifindex.

Before aligning on a direction for the return values from exiting xdp
call, it seems like we should also think about the tx side and how that
would be handled.  If we are ultimately going to need a new netdev op to
handle the redirect then what may be the issue with not providing the
destination port the return code and the option proposed by JohnF looks
good to me with maybe a small tweak to not presume ifindex in some manner.

JohnF, any test results with this you can share?  Presumably you tested with
virtio-net, right?

> 
> 3) Extend only xdp_buff and create bpf_helper that set port in xdp_buff.
> 
>  Pros: Hides impl details, and allows helper to give eBPF code feedback
>(on e.g. if port doesn't exist any longer)
>  Cons: Helper function call likely slower?
> 
> 
> (Cc'ed xdp-newbies as end-users might have an opinion on UAPI?)
> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
> 
> [1] 
> http://people.netfilter.org/hawk/presentations/NetConf2017/xdp_work_ahead_NetConf_April_2017.pdf


Re: XDP question: best API for returning/setting egress port?

2017-04-19 Thread Jesper Dangaard Brouer
On Wed, 19 Apr 2017 14:33:27 +0200
Daniel Borkmann  wrote:

> On 04/19/2017 02:00 PM, Jesper Dangaard Brouer wrote:
> > On Tue, 18 Apr 2017 13:54:45 -0700
> > John Fastabend  wrote:  
> >> On 17-04-18 12:58 PM, Jesper Dangaard Brouer wrote:  
> >>>
> >>> As I argued in NetConf presentation[1] (from slide #9) we need a port
> >>> mapping table (instead of using ifindex'es).  Both for supporting
> >>> other "port" types than net_devices (think sockets), and for
> >>> sandboxing what XDP can bypass.
> >>>
> >>> I want to create a new XDP action called XDP_REDIRECT, that instruct
> >>> XDP to send the xdp_buff to another "port" (get translated into a
> >>> net_device, or something else depending on internal port type).
> >>>
> >>> Looking at the userspace/eBPF interface, I'm wondering what is the
> >>> best API for "returning" this port number from eBPF?
> >>>
> >>> The options I see is:
> >>>
> >>> 1) Split-up the u32 action code, and e.g let the high-16-bit be the
> >>> port number and lower-16bit the (existing) action verdict.
> >>>
> >>>   Pros: Simple API
> >>>   Cons: Number of ports limited to 64K
> >>>
> >>> 2) Extend both xdp_buff + xdp_md to contain a (u32) port number, allow
> >>> eBPF to update xdp_md->port.
> >>>
> >>>   Pros: Larger number of ports.
> >>>   Cons: This require some ebpf translation steps between xdp_buff <-> 
> >>> xdp_md.
> >>> (see xdp_convert_ctx_access)
> >>>
> >>> 3) Extend only xdp_buff and create bpf_helper that set port in xdp_buff.
> >>>
> >>>   Pros: Hides impl details, and allows helper to give eBPF code feedback
> >>> (on e.g. if port doesn't exist any longer)
> >>>   Cons: Helper function call likely slower?  
> >>
> >> How about doing this the same way redirect is done in the tc case? I have 
> >> this
> >> patch under test,
> >>
> >>   
> >> https://github.com/jrfastab/linux/commit/e78f5425d5e3c305b4170ddd85c61c2e15359fee
> >>   
> >
> > I have been looking at this approach, which is close to option #3 above.
> >
> > The problem with your implementation that you use a per-cpu store.
> > This creates the problem of storing state between packets. First packet
> > can call helper bpf_xdp_redirect() setting an ifindex, but program can
> > still return XDP_PASS.  Next packet can call XDP_REDIRECT and use the
> > ifindex set from the first packet.  IMHO this is a problematic API to
> > expose.
> >
> > I do see that the TC interface that uses the same approach, via helper
> > bpf_redirect().  Maybe it have the same API problem?  Looking at
> > sch_handle_ingress() I don't see this is handled (e.g. by always
> > clearing this_cpu_ptr(redirect_info)->ifindex = 0).  
> 
> It's cleared in {skb,xdp}_do_redirect() right after fetching the
> ifindex. I think this approach is just fine. The example described
> above is a misuse of the API by a buggy program calling bpf_xdp_redirect()
> and returning XDP_PASS while another time it returns XDP_REDIRECT
> without the bpf_xdp_redirect() helper, sounds very exotic, but it's
> as buggy as, say, a program doing the csum update wrong, a program
> writing the wrong data to the packet, doing adjust head on the wrong
> header offset, jumping into the wrong tail call entry and other things.

For TC I guess it is fine to keep it as is, because it is needed to
avoid extending skb.  IHMO for XDP I see no reason to keep a
per-cpu-store (which besides will be slower), simply update
xdp_buff.port should be sufficient (which is only relevant for this
packet).

As noted in option#3, my concern is that calling a helper function call
will be slower, than simply returning the needed port info? 

Maybe some bpf experts can tell me if such helper call could be
optimized out with some bpf magic?

> I think encoding this into an action code is rather limiting, f.e.
> where would we place a flags argument if needed in future? Would
> that mean, we need a XDP_REDIRECT2 return code that also allows for
> encoding flags?

Nope, it will be extensible.

We can start with:

 struct xdp_ret {
 union {
 __u32 act;
 struct {
 __u16 action;
 __u16 port;
 };
 };

And later change it to:

 struct xdp_ret {
 union {
 __u32 act;
 struct {
 __u8  action;
 __u8  flags;
 __u16 port;
 };
 };

If actions does not go above 255.  I would prefer that we start with
the latter, else people would argue that we need to extend the
structure like:

 struct xdp_ret {
 union {
 __u32 act;
 struct {
 union {
 __u16 action;
 struct {
 __u8 action2;
 __u8 flags;
 };
 };
 __u16 port;
 };
 };


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: XDP question: best API for returning/setting egress port?

2017-04-19 Thread Daniel Borkmann

On 04/19/2017 02:00 PM, Jesper Dangaard Brouer wrote:

On Tue, 18 Apr 2017 13:54:45 -0700
John Fastabend  wrote:

On 17-04-18 12:58 PM, Jesper Dangaard Brouer wrote:


As I argued in NetConf presentation[1] (from slide #9) we need a port
mapping table (instead of using ifindex'es).  Both for supporting
other "port" types than net_devices (think sockets), and for
sandboxing what XDP can bypass.

I want to create a new XDP action called XDP_REDIRECT, that instruct
XDP to send the xdp_buff to another "port" (get translated into a
net_device, or something else depending on internal port type).

Looking at the userspace/eBPF interface, I'm wondering what is the
best API for "returning" this port number from eBPF?

The options I see is:

1) Split-up the u32 action code, and e.g let the high-16-bit be the
port number and lower-16bit the (existing) action verdict.

  Pros: Simple API
  Cons: Number of ports limited to 64K

2) Extend both xdp_buff + xdp_md to contain a (u32) port number, allow
eBPF to update xdp_md->port.

  Pros: Larger number of ports.
  Cons: This require some ebpf translation steps between xdp_buff <-> xdp_md.
(see xdp_convert_ctx_access)

3) Extend only xdp_buff and create bpf_helper that set port in xdp_buff.

  Pros: Hides impl details, and allows helper to give eBPF code feedback
(on e.g. if port doesn't exist any longer)
  Cons: Helper function call likely slower?


How about doing this the same way redirect is done in the tc case? I have this
patch under test,

  
https://github.com/jrfastab/linux/commit/e78f5425d5e3c305b4170ddd85c61c2e15359fee


I have been looking at this approach, which is close to option #3 above.

The problem with your implementation that you use a per-cpu store.
This creates the problem of storing state between packets. First packet
can call helper bpf_xdp_redirect() setting an ifindex, but program can
still return XDP_PASS.  Next packet can call XDP_REDIRECT and use the
ifindex set from the first packet.  IMHO this is a problematic API to
expose.

I do see that the TC interface that uses the same approach, via helper
bpf_redirect().  Maybe it have the same API problem?  Looking at
sch_handle_ingress() I don't see this is handled (e.g. by always
clearing this_cpu_ptr(redirect_info)->ifindex = 0).


It's cleared in {skb,xdp}_do_redirect() right after fetching the
ifindex. I think this approach is just fine. The example described
above is a misuse of the API by a buggy program calling bpf_xdp_redirect()
and returning XDP_PASS while another time it returns XDP_REDIRECT
without the bpf_xdp_redirect() helper, sounds very exotic, but it's
as buggy as, say, a program doing the csum update wrong, a program
writing the wrong data to the packet, doing adjust head on the wrong
header offset, jumping into the wrong tail call entry and other things.

I think encoding this into an action code is rather limiting, f.e.
where would we place a flags argument if needed in future? Would
that mean, we need a XDP_REDIRECT2 return code that also allows for
encoding flags?

Thanks,
Daniel


Re: XDP question: best API for returning/setting egress port?

2017-04-19 Thread Hannes Frederic Sowa
Hi,

On 18.04.2017 21:58, Jesper Dangaard Brouer wrote:
> 
> As I argued in NetConf presentation[1] (from slide #9) we need a port
> mapping table (instead of using ifindex'es).  Both for supporting
> other "port" types than net_devices (think sockets), and for
> sandboxing what XDP can bypass.
> 
> I want to create a new XDP action called XDP_REDIRECT, that instruct
> XDP to send the xdp_buff to another "port" (get translated into a
> net_device, or something else depending on internal port type).
> 
> Looking at the userspace/eBPF interface, I'm wondering what is the
> best API for "returning" this port number from eBPF?
> 
> The options I see is:
> 
> 1) Split-up the u32 action code, and e.g let the high-16-bit be the
>port number and lower-16bit the (existing) action verdict.
> 
>  Pros: Simple API
>  Cons: Number of ports limited to 64K
> 
> 2) Extend both xdp_buff + xdp_md to contain a (u32) port number, allow
>eBPF to update xdp_md->port.
> 
>  Pros: Larger number of ports.
>  Cons: This require some ebpf translation steps between xdp_buff <-> xdp_md.
>(see xdp_convert_ctx_access)
> 
> 3) Extend only xdp_buff and create bpf_helper that set port in xdp_buff.
> 
>  Pros: Hides impl details, and allows helper to give eBPF code feedback
>(on e.g. if port doesn't exist any longer)
>  Cons: Helper function call likely slower?
> 
> 
> (Cc'ed xdp-newbies as end-users might have an opinion on UAPI?)

I am not sure how the socket interface should look like, it seems to be
too far away to me right now.

Regarding having stable ifindexes, I wonder if we could do something:

int ifindexes_in_use_by_ebpf_program[] __section("ifindex") = {
1,2,3,8,9,10 };

and we can make sure that the ifindexes automatically stay stable for
the lifetime while the ebpf program is loaded?

Bye,
Hannes


Re: XDP question: best API for returning/setting egress port?

2017-04-19 Thread Jesper Dangaard Brouer
On Tue, 18 Apr 2017 13:54:45 -0700
John Fastabend  wrote:

> On 17-04-18 12:58 PM, Jesper Dangaard Brouer wrote:
> > 
> > As I argued in NetConf presentation[1] (from slide #9) we need a port
> > mapping table (instead of using ifindex'es).  Both for supporting
> > other "port" types than net_devices (think sockets), and for
> > sandboxing what XDP can bypass.
> > 
> > I want to create a new XDP action called XDP_REDIRECT, that instruct
> > XDP to send the xdp_buff to another "port" (get translated into a
> > net_device, or something else depending on internal port type).
> > 
> > Looking at the userspace/eBPF interface, I'm wondering what is the
> > best API for "returning" this port number from eBPF?
> > 
> > The options I see is:
> > 
> > 1) Split-up the u32 action code, and e.g let the high-16-bit be the
> >port number and lower-16bit the (existing) action verdict.
> > 
> >  Pros: Simple API
> >  Cons: Number of ports limited to 64K
> > 
> > 2) Extend both xdp_buff + xdp_md to contain a (u32) port number, allow
> >eBPF to update xdp_md->port.
> > 
> >  Pros: Larger number of ports.
> >  Cons: This require some ebpf translation steps between xdp_buff <-> xdp_md.
> >(see xdp_convert_ctx_access)
> > 
> > 3) Extend only xdp_buff and create bpf_helper that set port in xdp_buff.
> > 
> >  Pros: Hides impl details, and allows helper to give eBPF code feedback
> >(on e.g. if port doesn't exist any longer)
> >  Cons: Helper function call likely slower?
> > 
> >   
> 
> How about doing this the same way redirect is done in the tc case? I have this
> patch under test,
> 
>  
> https://github.com/jrfastab/linux/commit/e78f5425d5e3c305b4170ddd85c61c2e15359fee

I have been looking at this approach, which is close to option #3 above.

The problem with your implementation that you use a per-cpu store.
This creates the problem of storing state between packets. First packet
can call helper bpf_xdp_redirect() setting an ifindex, but program can
still return XDP_PASS.  Next packet can call XDP_REDIRECT and use the
ifindex set from the first packet.  IMHO this is a problematic API to
expose.

I do see that the TC interface that uses the same approach, via helper
bpf_redirect().  Maybe it have the same API problem?  Looking at
sch_handle_ingress() I don't see this is handled (e.g. by always
clearing this_cpu_ptr(redirect_info)->ifindex = 0).


> that should give you some idea. It just needs a port mapping table in the
> bpf_tx_xdp() call.

I'll take a closer look. I don't think we need the per-cpu-store
approach for XDP, as we might as well store the port info in xdp_buff,
or return it directly option #1.

(TC redirect need the per-cpu-store to avoid extending the SKB).


> > (Cc'ed xdp-newbies as end-users might have an opinion on UAPI?)

I would still like people to comment on the above options?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: XDP question: best API for returning/setting egress port?

2017-04-18 Thread John Fastabend
On 17-04-18 12:58 PM, Jesper Dangaard Brouer wrote:
> 
> As I argued in NetConf presentation[1] (from slide #9) we need a port
> mapping table (instead of using ifindex'es).  Both for supporting
> other "port" types than net_devices (think sockets), and for
> sandboxing what XDP can bypass.
> 
> I want to create a new XDP action called XDP_REDIRECT, that instruct
> XDP to send the xdp_buff to another "port" (get translated into a
> net_device, or something else depending on internal port type).
> 
> Looking at the userspace/eBPF interface, I'm wondering what is the
> best API for "returning" this port number from eBPF?
> 
> The options I see is:
> 
> 1) Split-up the u32 action code, and e.g let the high-16-bit be the
>port number and lower-16bit the (existing) action verdict.
> 
>  Pros: Simple API
>  Cons: Number of ports limited to 64K
> 
> 2) Extend both xdp_buff + xdp_md to contain a (u32) port number, allow
>eBPF to update xdp_md->port.
> 
>  Pros: Larger number of ports.
>  Cons: This require some ebpf translation steps between xdp_buff <-> xdp_md.
>(see xdp_convert_ctx_access)
> 
> 3) Extend only xdp_buff and create bpf_helper that set port in xdp_buff.
> 
>  Pros: Hides impl details, and allows helper to give eBPF code feedback
>(on e.g. if port doesn't exist any longer)
>  Cons: Helper function call likely slower?
> 
> 

How about doing this the same way redirect is done in the tc case? I have this
patch under test,

 
https://github.com/jrfastab/linux/commit/e78f5425d5e3c305b4170ddd85c61c2e15359fee

that should give you some idea. It just needs a port mapping table in the
bpf_tx_xdp() call.


> (Cc'ed xdp-newbies as end-users might have an opinion on UAPI?)
>