Re: [iovisor-dev] XDP seeking input from NIC hardware vendors

2016-07-26 Thread Tom Herbert via iovisor-dev
On Tue, Jul 26, 2016 at 10:53 AM, John Fastabend
 wrote:
> On 16-07-26 09:08 AM, Tom Herbert wrote:
>> On Tue, Jul 26, 2016 at 6:31 AM, Thomas Monjalon
>>  wrote:
>>> Hi,
>>>
>>> About RX filtering, there is an ongoing effort in DPDK to write an API
>>> which could leverage most of the hardware capabilities of any NICs:
>>> https://rawgit.com/6WIND/rte_flow/master/rte_flow.html
>>> http://thread.gmane.org/gmane.comp.networking.dpdk.devel/43352
>>> I understand that XDP does not target to support every hardware features,
>>> though it may be an interesting approach to check.
>>>
>> Thomas,
>>
>> A major goal of XDP is to leverage and in fact encourage innovation in
>> hardware features. But, we are asking that vendors design the APIs
>> with the community in mind. For instance, if XDP supports crypto
>> offload it should have one API that different companies, we don't want
>> every vendor coming up with their own.
>
> The work in those threads is to create a single API for users of DPDK
> to interact with their hardware. The equivalent interface in Linux
> kernel is ntuple filters from ethtool the effort here is to make a
> usable interface to manage this from an application and also expose
> all the hardware features. Ethtool does a fairly poor job on both
> fronts IMO.
>
> If we evolve the mechanism to run per rx queue xdp programs this
> interface could easily be used to forward packets to specific rx
> queues and run targeted xdp programs.
>
> Integrating this functionality into running XDP programs as ebpf code
> seems a bit challenging to me because there is no software equivalent.
> Once XDP ebpf program is running the pkt has already landed on the rx
> queue. To me the mechanism to bind XDP programs to rx queues and steer
> specific flows (e.g. match a flow label and forward to a queue) needs
> to be part of the runtime environment not part of the main ebpf program
> itself. The runtime environment could use the above linked API. I know
> we debated earlier including this in the ebpf program itself but that
> really doesn't seem feasible to me. Whether the steering is expresses
> as an ebpf program or an API like above seems like a reasonable
> discussion. Perhaps a section could be used to describe the per program
> filter for example which would be different from an API approach used
> in the proposal above or the JIT could translate it into the above
> API for devices without instruction based hardware.
>
I think your convoluting two different mechanisms. If the device has
the capability to different packets and steer them to different
queues, then XDP should be able to make use of that by running
different programs appropriate for each queue. Packet steering is the
domain of HW, for that we have ntuple filtering now but there is no
reason to believe that XDP won't be used for that also. Per queue
program in the host is just configuration, i.e. bind this program to
that queue. Even if the first is not ready, I don't see why the second
is so complex; it should just be a matter of per queue configuration
for which there is already a lot of infrastructure.

> Step 0 should be to show a set of compelling use cases that want to run
> per queue programs then we can talk about the runtime.

Imagine we are able to split VM traffic and non-VM traffic to
different RX queues. The program we will want to run would be very
different in those cases.

Tom

>
>
>>
>>> 2016-07-12 22:32, Jesper Dangaard Brouer via iovisor-dev:
 On Tue, 12 Jul 2016 12:13:01 -0700
 John Fastabend  wrote:
>
> Another use case I have is to make a really high performance AF_PACKET
> interface. So if there was a way to say bind a queue to an AF_PACKET
> ring and run a policy XDP program before hitting the AF_PACKET
> descriptor bit that would be really interesting because it would solve
> some of my need for poll mode drivers in userspace.
>>>
>>> Have you started this work?
>>> Do you have an idea of how RX would perform through XDP + AF_PACKET + DPDK?
>>>
>> I don't understand why the AF_PACKET with DPDK. They should be
>> mutually exclusive. XDP over DPDK does make sense.
>>
>
> Because DPDK is more than just a poll mode driver that binds to a
> device. AF_Packet could be used as a replacement for a specific poll
> mode driver but the application could still use the other libraries
> provided by DPDK to build ACLs for example or deep packet inspection.
>
>
>> Tom
>>
>
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] XDP seeking input from NIC hardware vendors

2016-07-12 Thread Jakub Kicinski via iovisor-dev
On Tue, 12 Jul 2016 12:13:01 -0700, John Fastabend wrote:
> On 16-07-11 07:24 PM, Alexei Starovoitov wrote:
> > On Sat, Jul 09, 2016 at 01:27:26PM +0200, Jesper Dangaard Brouer wrote:  
> >> On Fri, 8 Jul 2016 18:51:07 +0100
> >> Jakub Kicinski  wrote:
> >>  
> >>> On Fri, 8 Jul 2016 09:45:25 -0700, John Fastabend wrote:  
>  The only distinction between VFs and queue groupings on my side is VFs
>  provide RSS where as queue groupings have to be selected explicitly.
>  In a programmable NIC world the distinction might be lost if a "RSS"
>  program can be loaded into the NIC to select queues but for existing
>  hardware the distinction is there.
> >>>
> >>> To do BPF RSS we need a way to select the queue which I think is all
> >>> Jesper wanted.  So we will have to tackle the queue selection at some
> >>> point.  The main obstacle with it for me is to define what queue
> >>> selection means when program is not offloaded to HW...  Implementing
> >>> queue selection on HW side is trivial.  
> >>
> >> Yes, I do see the problem of fallback, when the programs "filter" demux
> >> cannot be offloaded to hardware.
> >>
> >> First I though it was a good idea to keep the "demux-filter" part of
> >> the eBPF program, as software fallback can still apply this filter in
> >> SW, and just mark the packets as not-zero-copy-safe.  But when HW
> >> offloading is not possible, then packets can be delivered every RX
> >> queue, and SW would need to handle that, which hard to keep transparent.
> >>
> >>  
>  If you demux using a eBPF program or via a filter model like
>  flow_director or cls_{u32|flower} I think we can support both. And this
>  just depends on the programmability of the hardware. Note flow_director
>  and cls_{u32|flower} steering to VFs is already in place.
> >>
> >> Maybe we should keep HW demuxing as a separate setup step.
> >>
> >> Today I can almost do what I want: by setting up ntuple filters, and (if
> >> Alexei allows it) assign an application specific XDP eBPF program to a
> >> specific RX queue.
> >>
> >>  ethtool -K eth2 ntuple on
> >>  ethtool -N eth2 flow-type udp4 dst-ip 192.168.254.1 dst-port 53 action 42
> >>
> >> Then the XDP program can be attached to RX queue 42, and
> >> promise/guarantee that it will consume all packet.  And then the
> >> backing page-pool can allow zero-copy RX (and enable scrubbing when
> >> refilling pool).  
> > 
> > so such ntuple rule will send udp4 traffic for specific ip and port
> > into a queue then it will somehow gets zero-copied to vm?
> > . looks like a lot of other pieces about zero-copy and qemu need to be
> > implemented (or at least architected) for this scheme to be conceivable
> > . and when all that happens what vm is going to do with this very specific
> > traffic? vm won't have any tcp or even ping?  
> 
> I have perhaps a different motivation to have queue steering in 'tc
> cls-u32' and eventually xdp. The general idea is I have thousands of
> queues and I can bind applications to the queues. When I know an
> application is bound to a queue I can enable per queue busy polling (to
> be implemented), set specific interrupt rates on the queue
> (implementation will be posted soon), bind the queue to the correct
> cpu, etc.
> 
> ntuple works OK for this now but xdp provides more flexibility and
> also lets us add additional policy on the queue other than simply
> queue steering.
> 
> I'm not convinced though that the demux queue selection should be part
> of the XDP program itself just because it has no software analog to me
> it sits in front of the set of XDP programs. 

Yes, although if we expect XDP to be target of offloading efforts
putting the demux here doesn't seem like an entirely bad idea.  We
could say demux is just an API that more capable drivers/HW can
implement.

> But I think I could perhaps
> be convinced it does if there is some reasonable way to do it. I guess
> the single program method would result in an XDP program that read like
> 
>   if (rx_queue == x)
>do_foo
>   if (rx_queue == y)
>do_bar
> 
> A hardware jit may be able to sort that out.

+1  
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] XDP seeking input from NIC hardware vendors

2016-07-12 Thread John Fastabend via iovisor-dev
On 16-07-11 07:24 PM, Alexei Starovoitov wrote:
> On Sat, Jul 09, 2016 at 01:27:26PM +0200, Jesper Dangaard Brouer wrote:
>> On Fri, 8 Jul 2016 18:51:07 +0100
>> Jakub Kicinski  wrote:
>>
>>> On Fri, 8 Jul 2016 09:45:25 -0700, John Fastabend wrote:
 The only distinction between VFs and queue groupings on my side is VFs
 provide RSS where as queue groupings have to be selected explicitly.
 In a programmable NIC world the distinction might be lost if a "RSS"
 program can be loaded into the NIC to select queues but for existing
 hardware the distinction is there.  
>>>
>>> To do BPF RSS we need a way to select the queue which I think is all
>>> Jesper wanted.  So we will have to tackle the queue selection at some
>>> point.  The main obstacle with it for me is to define what queue
>>> selection means when program is not offloaded to HW...  Implementing
>>> queue selection on HW side is trivial.
>>
>> Yes, I do see the problem of fallback, when the programs "filter" demux
>> cannot be offloaded to hardware.
>>
>> First I though it was a good idea to keep the "demux-filter" part of
>> the eBPF program, as software fallback can still apply this filter in
>> SW, and just mark the packets as not-zero-copy-safe.  But when HW
>> offloading is not possible, then packets can be delivered every RX
>> queue, and SW would need to handle that, which hard to keep transparent.
>>
>>
 If you demux using a eBPF program or via a filter model like
 flow_director or cls_{u32|flower} I think we can support both. And this
 just depends on the programmability of the hardware. Note flow_director
 and cls_{u32|flower} steering to VFs is already in place.  
>>
>> Maybe we should keep HW demuxing as a separate setup step.
>>
>> Today I can almost do what I want: by setting up ntuple filters, and (if
>> Alexei allows it) assign an application specific XDP eBPF program to a
>> specific RX queue.
>>
>>  ethtool -K eth2 ntuple on
>>  ethtool -N eth2 flow-type udp4 dst-ip 192.168.254.1 dst-port 53 action 42
>>
>> Then the XDP program can be attached to RX queue 42, and
>> promise/guarantee that it will consume all packet.  And then the
>> backing page-pool can allow zero-copy RX (and enable scrubbing when
>> refilling pool).
> 
> so such ntuple rule will send udp4 traffic for specific ip and port
> into a queue then it will somehow gets zero-copied to vm?
> . looks like a lot of other pieces about zero-copy and qemu need to be
> implemented (or at least architected) for this scheme to be conceivable
> . and when all that happens what vm is going to do with this very specific
> traffic? vm won't have any tcp or even ping?

I have perhaps a different motivation to have queue steering in 'tc
cls-u32' and eventually xdp. The general idea is I have thousands of
queues and I can bind applications to the queues. When I know an
application is bound to a queue I can enable per queue busy polling (to
be implemented), set specific interrupt rates on the queue
(implementation will be posted soon), bind the queue to the correct
cpu, etc.

ntuple works OK for this now but xdp provides more flexibility and
also lets us add additional policy on the queue other than simply
queue steering.

I'm not convinced though that the demux queue selection should be part
of the XDP program itself just because it has no software analog to me
it sits in front of the set of XDP programs. But I think I could perhaps
be convinced it does if there is some reasonable way to do it. I guess
the single program method would result in an XDP program that read like

  if (rx_queue == x)
   do_foo
  if (rx_queue == y)
   do_bar

A hardware jit may be able to sort that out. Or use per queue sections.

> 
> the network virtualization traffic is typically encapsulated,
> so if xdp is used to do steer the traffic, the program would need
> to figure out vm id based on headers, strip tunnel, apply policy before
> forwarding the packet further. Clearly hw ntuple is not going to suffice.
>
> If there is no networking virtualization and VMs are operating in the
> flat network, then there is no policy, no ip filter, no vm migration.
> Only mac per vm and sriov handles this case just fine.
> When hw becomes more programmable we'll be able to load xdp program
> into hw that does tunnel, policy and forwards into vf then sriov will
> become actually usable for cloud providers.

Yep :)

> hw xdp into vf is more interesting than into a queue, since there is
> more than one queue/interrupt per vf and network heavy vm can actually
> consume large amount of traffic.
> 

Another use case I have is to make a really high performance AF_PACKET
interface. So if there was a way to say bind a queue to an AF_PACKET
ring and run a policy XDP program before hitting the AF_PACKET
descriptor bit that would be really interesting because it would solve
some of my need for poll mode drivers in userspace.

.John


Re: [iovisor-dev] XDP seeking input from NIC hardware vendors

2016-07-11 Thread Alexei Starovoitov via iovisor-dev
On Sat, Jul 09, 2016 at 01:27:26PM +0200, Jesper Dangaard Brouer wrote:
> On Fri, 8 Jul 2016 18:51:07 +0100
> Jakub Kicinski  wrote:
> 
> > On Fri, 8 Jul 2016 09:45:25 -0700, John Fastabend wrote:
> > > The only distinction between VFs and queue groupings on my side is VFs
> > > provide RSS where as queue groupings have to be selected explicitly.
> > > In a programmable NIC world the distinction might be lost if a "RSS"
> > > program can be loaded into the NIC to select queues but for existing
> > > hardware the distinction is there.  
> > 
> > To do BPF RSS we need a way to select the queue which I think is all
> > Jesper wanted.  So we will have to tackle the queue selection at some
> > point.  The main obstacle with it for me is to define what queue
> > selection means when program is not offloaded to HW...  Implementing
> > queue selection on HW side is trivial.
> 
> Yes, I do see the problem of fallback, when the programs "filter" demux
> cannot be offloaded to hardware.
> 
> First I though it was a good idea to keep the "demux-filter" part of
> the eBPF program, as software fallback can still apply this filter in
> SW, and just mark the packets as not-zero-copy-safe.  But when HW
> offloading is not possible, then packets can be delivered every RX
> queue, and SW would need to handle that, which hard to keep transparent.
> 
> 
> > > If you demux using a eBPF program or via a filter model like
> > > flow_director or cls_{u32|flower} I think we can support both. And this
> > > just depends on the programmability of the hardware. Note flow_director
> > > and cls_{u32|flower} steering to VFs is already in place.  
> 
> Maybe we should keep HW demuxing as a separate setup step.
> 
> Today I can almost do what I want: by setting up ntuple filters, and (if
> Alexei allows it) assign an application specific XDP eBPF program to a
> specific RX queue.
> 
>  ethtool -K eth2 ntuple on
>  ethtool -N eth2 flow-type udp4 dst-ip 192.168.254.1 dst-port 53 action 42
> 
> Then the XDP program can be attached to RX queue 42, and
> promise/guarantee that it will consume all packet.  And then the
> backing page-pool can allow zero-copy RX (and enable scrubbing when
> refilling pool).

so such ntuple rule will send udp4 traffic for specific ip and port
into a queue then it will somehow gets zero-copied to vm?
. looks like a lot of other pieces about zero-copy and qemu need to be
implemented (or at least architected) for this scheme to be conceivable
. and when all that happens what vm is going to do with this very specific
traffic? vm won't have any tcp or even ping?

the network virtualization traffic is typically encapsulated,
so if xdp is used to do steer the traffic, the program would need
to figure out vm id based on headers, strip tunnel, apply policy before
forwarding the packet further. Clearly hw ntuple is not going to suffice.

If there is no networking virtualization and VMs are operating in the
flat network, then there is no policy, no ip filter, no vm migration.
Only mac per vm and sriov handles this case just fine.
When hw becomes more programmable we'll be able to load xdp program
into hw that does tunnel, policy and forwards into vf then sriov will
become actually usable for cloud providers.
hw xdp into vf is more interesting than into a queue, since there is
more than one queue/interrupt per vf and network heavy vm can actually
consume large amount of traffic.

___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] XDP seeking input from NIC hardware vendors

2016-07-08 Thread Jakub Kicinski via iovisor-dev
On Thu, 7 Jul 2016 19:22:12 -0700, Alexei Starovoitov wrote:
> > If the goal is to just separate XDP traffic from non-XDP traffic you could 
> > accomplish this with a combination of SR-IOV/macvlan to separate the device 
> > queues into multiple netdevs and then run XDP on just one of the netdevs. 
> > Then use flow director (ethtool) or 'tc cls_u32/flower' to steer traffic to 
> > the netdev. This is how we support multiple networking stacks on one device 
> > by the way it is called the bifurcated driver. Its not too far of a stretch 
> > to think we could offload some simple XDP programs to program the splitting 
> > of traffic instead of cls_u32/flower/flow_director and then you would have 
> > a stack of XDP programs. One running in hardware and a set running on the 
> > queues in software.  
> 
> the above sounds like much better approach then Jesper/mine prog_per_ring 
> stuff.
> If we can split the nic via sriov and have dedicated netdev via VF just for 
> XDP that's way cleaner approach.
> I guess we won't need to do xdp_rxqmask after all.

+1

I was thinking about using eBPF to direct to NIC queues but concluded
that doing a redirect to a VF is cleaner.  Especially if the PF driver
supports VF representatives we could potentially just use
bpf_redirect(VFR netdev) and the VF doesn't even have to be handled by
the same stack.
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] XDP seeking input from NIC hardware vendors

2016-07-07 Thread Alexei Starovoitov via iovisor-dev
On Thu, Jul 07, 2016 at 09:05:29PM -0700, John Fastabend wrote:
> On 16-07-07 07:22 PM, Alexei Starovoitov wrote:
> > On Thu, Jul 07, 2016 at 03:18:11PM +, Fastabend, John R wrote:
> >> Hi Jesper,
> >>
> >> I have done some previous work on proprietary systems where we
> >> used hardware to do the classification/parsing then passed a cookie to the
> >> software which used the cookie to lookup a program to run on the packet.
> >> When your programs are structured as a bunch of parsing followed by some
> >> actions this can provide real performance benefits. Also a lot of
> >> existing hardware supports this today assuming you use headers the
> >> hardware "knows" about. It's a natural model for hardware that uses a
> >> parser followed by tcam/cam/sram/etc lookup tables.
> 
> > looking at bpf programs written in plumgrid, facebook and cisco
> > with full certainty I can assure that parse/action split doesn't exist.
> > Parsing is always interleaved with lookups and actions.
> > cpu spends a tiny fraction of time doing parsing. Lookups are the heaviest.
> 
> What is heavy about a lookup? Is it the key generation? The key
> generation can be provided by the hardware is what I was really alluding
> to. If your data structures are ebpf maps though its probably a hash
> or array table and the benefit of leveraging hardware would likely be
> much better if/when there are software structures for LPM or wildcard
> lookups.

there is only hash map in the sw and the main cost of it was doing jhash
math and occasional miss in hashtable.
'key generation' is only copying bytes, so it mostly free.
Just like parsing which is few branches which tend to be predicted
by cpu quite well.
In case of our L4 loadbalancer we need to do consistent hash which
fixed hw probably won't be able to provide.
Unless hw is programmable :)
In general when we developed and benchmarked the programs,
redesigning the program to remove extra hash lookup gave performance
improvement whereas simplifying parsing logic (like removing vlan
handling or ip option) showed no difference in performance.

> > Trying to split single logical program into parsing/after_parse stages
> > has no pracitcal benefit.
> > 
> >> If the goal is to just separate XDP traffic from non-XDP traffic
> >> you could accomplish this with a combination of SR-IOV/macvlan to separate
> >> the device queues into multiple netdevs and then run XDP on just one of
> >> the netdevs. Then use flow director (ethtool) or 'tc cls_u32/flower' to
> >> steer traffic to the netdev. This is how we support multiple networking
> >> stacks on one device by the way it is called the bifurcated driver. Its
> >> not too far of a stretch to think we could offload some simple XDP
> >> programs to program the splitting of traffic instead of
> >> cls_u32/flower/flow_director and then you would have a stack of XDP
> >> programs. One running in hardware and a set running on the queues in
> >> software.
> > 
> > the above sounds like much better approach then Jesper/mine prog_per_ring 
> > stuff.
> > If we can split the nic via sriov and have dedicated netdev via VF just for 
> > XDP that's way cleaner approach.
> > I guess we won't need to do xdp_rxqmask after all.
> > 
> 
> Right and this works today so all it would require is adding the XDP
> engine code to the VF drivers. Which should be relatively straight
> forward if you have the PF driver working.

Good point. I think the next step should be to enable xdp in VF drivers
and measure performance.

___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] XDP seeking input from NIC hardware vendors

2016-07-07 Thread John Fastabend via iovisor-dev
On 16-07-07 07:22 PM, Alexei Starovoitov wrote:
> On Thu, Jul 07, 2016 at 03:18:11PM +, Fastabend, John R wrote:
>> Hi Jesper,
>>
>> I have done some previous work on proprietary systems where we
>> used hardware to do the classification/parsing then passed a cookie to the
>> software which used the cookie to lookup a program to run on the packet.
>> When your programs are structured as a bunch of parsing followed by some
>> actions this can provide real performance benefits. Also a lot of
>> existing hardware supports this today assuming you use headers the
>> hardware "knows" about. It's a natural model for hardware that uses a
>> parser followed by tcam/cam/sram/etc lookup tables.

> looking at bpf programs written in plumgrid, facebook and cisco
> with full certainty I can assure that parse/action split doesn't exist.
> Parsing is always interleaved with lookups and actions.
> cpu spends a tiny fraction of time doing parsing. Lookups are the heaviest.

What is heavy about a lookup? Is it the key generation? The key
generation can be provided by the hardware is what I was really alluding
to. If your data structures are ebpf maps though its probably a hash
or array table and the benefit of leveraging hardware would likely be
much better if/when there are software structures for LPM or wildcard
lookups.

> Trying to split single logical program into parsing/after_parse stages
> has no pracitcal benefit.
> 
>> If the goal is to just separate XDP traffic from non-XDP traffic
>> you could accomplish this with a combination of SR-IOV/macvlan to separate
>> the device queues into multiple netdevs and then run XDP on just one of
>> the netdevs. Then use flow director (ethtool) or 'tc cls_u32/flower' to
>> steer traffic to the netdev. This is how we support multiple networking
>> stacks on one device by the way it is called the bifurcated driver. Its
>> not too far of a stretch to think we could offload some simple XDP
>> programs to program the splitting of traffic instead of
>> cls_u32/flower/flow_director and then you would have a stack of XDP
>> programs. One running in hardware and a set running on the queues in
>> software.
> 
> the above sounds like much better approach then Jesper/mine prog_per_ring 
> stuff.
> If we can split the nic via sriov and have dedicated netdev via VF just for 
> XDP that's way cleaner approach.
> I guess we won't need to do xdp_rxqmask after all.
> 

Right and this works today so all it would require is adding the XDP
engine code to the VF drivers. Which should be relatively straight
forward if you have the PF driver working.

.John
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] XDP seeking input from NIC hardware vendors

2016-07-07 Thread Alexei Starovoitov via iovisor-dev
On Thu, Jul 07, 2016 at 03:18:11PM +, Fastabend, John R wrote:
> Hi Jesper,
> 
> I have done some previous work on proprietary systems where we used hardware 
> to do the classification/parsing then passed a cookie to the software which 
> used the cookie to lookup a program to run on the packet. When your programs 
> are structured as a bunch of parsing followed by some actions this can 
> provide real performance benefits. Also a lot of existing hardware supports 
> this today assuming you use headers the hardware "knows" about. It's a 
> natural model for hardware that uses a parser followed by tcam/cam/sram/etc 
> lookup tables.

looking at bpf programs written in plumgrid, facebook and cisco
with full certainty I can assure that parse/action split doesn't exist.
Parsing is always interleaved with lookups and actions.
cpu spends a tiny fraction of time doing parsing. Lookups are the heaviest.
Trying to split single logical program into parsing/after_parse stages
has no pracitcal benefit.

> If the goal is to just separate XDP traffic from non-XDP traffic you could 
> accomplish this with a combination of SR-IOV/macvlan to separate the device 
> queues into multiple netdevs and then run XDP on just one of the netdevs. 
> Then use flow director (ethtool) or 'tc cls_u32/flower' to steer traffic to 
> the netdev. This is how we support multiple networking stacks on one device 
> by the way it is called the bifurcated driver. Its not too far of a stretch 
> to think we could offload some simple XDP programs to program the splitting 
> of traffic instead of cls_u32/flower/flow_director and then you would have a 
> stack of XDP programs. One running in hardware and a set running on the 
> queues in software.

the above sounds like much better approach then Jesper/mine prog_per_ring stuff.
If we can split the nic via sriov and have dedicated netdev via VF just for XDP 
that's way cleaner approach.
I guess we won't need to do xdp_rxqmask after all.

___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] XDP seeking input from NIC hardware vendors

2016-07-07 Thread John Fastabend via iovisor-dev
On 16-07-07 10:53 AM, Tom Herbert wrote:
> On Thu, Jul 7, 2016 at 9:12 AM, Jakub Kicinski
>  wrote:
>> On Thu, 7 Jul 2016 15:18:11 +, Fastabend, John R wrote:
>>> The other interesting thing would be to do more than just packet
>>> steering but actually run a more complete XDP program. Netronome
>>> supports this right. The question I have though is this a stacked of
>>> XDP programs one or more designated for hardware and some running in
>>> software perhaps with some annotation in the program so the hardware
>>> JIT knows where to place programs or do we expect the JIT itself to
>>> try and decide what is best to offload. I think the easiest to start
>>> with is to annotate the programs.
>>>
>>> Also as far as I know a lot of hardware can stick extra data to the
>>> front or end of a packet so you could push metadata calculated by the
>>> program here in a generic way without having to extend XDP defined
>>> metadata structures. Another option is to DMA the metadata to a
>>> specified address. With this metadata the consumer/producer XDP
>>> programs have to agree on the format but no one else.
>>
>> Yes!
>>
>> At the XDP summit we were discussing pipe-lining XDP programs in
>> general, with different stages of the pipeline potentially using
>> specific hardware capabilities or even being directly mappable on
>> fixed HW functions.
>>
>> Designating parsing as one of specialized blocks makes sense in a long
>> run, probably at the first stage with recirculation possible.  We also
>> have some parsing HW we could utilize at some point.  However, I'm
>> worried that it's too early to impose constraints and APIs.  I agree
>> that we should first set a standard way to pass metadata across tail
>> calls to facilitate any form of pipe lining, regardless of which parts
>> of pipeline HW is able to offload.
> 
> +1
> 
> I don't see any reason why XDP programs can be turned into a pipeline,
> but this is implementation based on the output of one program being
> the inout of the next.  While XDP may work with pipeline it does not
> require it or define it. This makes XDP different from P4 and the
> match-action paradigm.
> 
> Tom
> 

Sounds like we all agree. Just a note, XDP is a reasonable target
for P4 in fact we have a P4 to eBPF target already working. We may end
up with a set of DSLs running on top of XDP where P4 is one of them.

.John
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] XDP seeking input from NIC hardware vendors

2016-07-07 Thread Tom Herbert via iovisor-dev
On Thu, Jul 7, 2016 at 9:12 AM, Jakub Kicinski
 wrote:
> On Thu, 7 Jul 2016 15:18:11 +, Fastabend, John R wrote:
>> The other interesting thing would be to do more than just packet
>> steering but actually run a more complete XDP program. Netronome
>> supports this right. The question I have though is this a stacked of
>> XDP programs one or more designated for hardware and some running in
>> software perhaps with some annotation in the program so the hardware
>> JIT knows where to place programs or do we expect the JIT itself to
>> try and decide what is best to offload. I think the easiest to start
>> with is to annotate the programs.
>>
>> Also as far as I know a lot of hardware can stick extra data to the
>> front or end of a packet so you could push metadata calculated by the
>> program here in a generic way without having to extend XDP defined
>> metadata structures. Another option is to DMA the metadata to a
>> specified address. With this metadata the consumer/producer XDP
>> programs have to agree on the format but no one else.
>
> Yes!
>
> At the XDP summit we were discussing pipe-lining XDP programs in
> general, with different stages of the pipeline potentially using
> specific hardware capabilities or even being directly mappable on
> fixed HW functions.
>
> Designating parsing as one of specialized blocks makes sense in a long
> run, probably at the first stage with recirculation possible.  We also
> have some parsing HW we could utilize at some point.  However, I'm
> worried that it's too early to impose constraints and APIs.  I agree
> that we should first set a standard way to pass metadata across tail
> calls to facilitate any form of pipe lining, regardless of which parts
> of pipeline HW is able to offload.

+1

I don't see any reason why XDP programs can be turned into a pipeline,
but this is implementation based on the output of one program being
the inout of the next.  While XDP may work with pipeline it does not
require it or define it. This makes XDP different from P4 and the
match-action paradigm.

Tom
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] XDP seeking input from NIC hardware vendors

2016-07-07 Thread Jakub Kicinski via iovisor-dev
On Thu, 7 Jul 2016 15:18:11 +, Fastabend, John R wrote:
> The other interesting thing would be to do more than just packet
> steering but actually run a more complete XDP program. Netronome
> supports this right. The question I have though is this a stacked of
> XDP programs one or more designated for hardware and some running in
> software perhaps with some annotation in the program so the hardware
> JIT knows where to place programs or do we expect the JIT itself to
> try and decide what is best to offload. I think the easiest to start
> with is to annotate the programs.
> 
> Also as far as I know a lot of hardware can stick extra data to the
> front or end of a packet so you could push metadata calculated by the
> program here in a generic way without having to extend XDP defined
> metadata structures. Another option is to DMA the metadata to a
> specified address. With this metadata the consumer/producer XDP
> programs have to agree on the format but no one else.

Yes!

At the XDP summit we were discussing pipe-lining XDP programs in
general, with different stages of the pipeline potentially using
specific hardware capabilities or even being directly mappable on
fixed HW functions.

Designating parsing as one of specialized blocks makes sense in a long
run, probably at the first stage with recirculation possible.  We also
have some parsing HW we could utilize at some point.  However, I'm
worried that it's too early to impose constraints and APIs.  I agree
that we should first set a standard way to pass metadata across tail
calls to facilitate any form of pipe lining, regardless of which parts
of pipeline HW is able to offload.
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] XDP seeking input from NIC hardware vendors

2016-07-07 Thread Fastabend, John R via iovisor-dev
Hi Jesper,

I have done some previous work on proprietary systems where we used hardware to 
do the classification/parsing then passed a cookie to the software which used 
the cookie to lookup a program to run on the packet. When your programs are 
structured as a bunch of parsing followed by some actions this can provide real 
performance benefits. Also a lot of existing hardware supports this today 
assuming you use headers the hardware "knows" about. It's a natural model for 
hardware that uses a parser followed by tcam/cam/sram/etc lookup tables.

If the goal is to just separate XDP traffic from non-XDP traffic you could 
accomplish this with a combination of SR-IOV/macvlan to separate the device 
queues into multiple netdevs and then run XDP on just one of the netdevs. Then 
use flow director (ethtool) or 'tc cls_u32/flower' to steer traffic to the 
netdev. This is how we support multiple networking stacks on one device by the 
way it is called the bifurcated driver. Its not too far of a stretch to think 
we could offload some simple XDP programs to program the splitting of traffic 
instead of cls_u32/flower/flow_director and then you would have a stack of XDP 
programs. One running in hardware and a set running on the queues in software.

The other interesting thing would be to do more than just packet steering but 
actually run a more complete XDP program. Netronome supports this right. The 
question I have though is this a stacked of XDP programs one or more designated 
for hardware and some running in software perhaps with some annotation in the 
program so the hardware JIT knows where to place programs or do we expect the 
JIT itself to try and decide what is best to offload. I think the easiest to 
start with is to annotate the programs.

Also as far as I know a lot of hardware can stick extra data to the front or 
end of a packet so you could push metadata calculated by the program here in a 
generic way without having to extend XDP defined metadata structures. Another 
option is to DMA the metadata to a specified address. With this metadata the 
consumer/producer XDP programs have to agree on the format but no one else.

FWIW I was hoping to get some data to show performance overhead vs how deep we 
parse into the packets. I just wont have time to get to it for awhile but that 
could tell us how much perf gain the hardware could provide.

Thanks,
John

-Original Message-
From: Jesper Dangaard Brouer [mailto:bro...@redhat.com] 
Sent: Thursday, July 7, 2016 3:43 AM
To: iovisor-dev@lists.iovisor.org
Cc: bro...@redhat.com; Brenden Blanco ; Alexei 
Starovoitov ; Rana Shahout ; 
Ari Saha ; Tariq Toukan ; Or Gerlitz 
; net...@vger.kernel.org; Simon Horman 
; Simon Horman ; Jakub Kicinski 
; Edward Cree ; Fastabend, 
John R 
Subject: XDP seeking input from NIC hardware vendors


Would it make sense from a hardware point of view, to split the XDP eBPF 
program into two stages.

 Stage-1: Filter (restricted eBPF / no-helper calls)
 Stage-2: Program

Then the HW can choose to offload stage-1 "filter", and keep the likely more 
advanced stage-2 on the kernel side.  Do HW vendors see a benefit of this 
approach?


The generic problem I'm trying to solve is parsing. E.g. that the first step in 
every XDP program will be to parse the packet-data, in-order to determine if 
this is a packet the XDP program should process.

Actions from stage-1 "filter" program:
 - DROP (like XDP_DROP, early drop)
 - PASS (like XDP_PASS, normal netstack)
 - MATCH (call stage-2, likely carry-over opaque return code)

The MATCH action should likely carry-over an opaque return code, that makes 
sense for the stage-2 program. E.g. proto id and/or data offset.

--
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer
--
Intel Research and Development Ireland Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263


This e-mail and any attachments may contain confidential material for the sole
use of the intended recipient(s). Any review or distribution by others is
strictly prohibited. If you are not the intended recipient, please contact the
sender and delete all copies.

___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev