[dpdk-dev] tcpdump support in DPDK 2.3

Bruce Richardson Wed, 16 Dec 2015 13:12:49 +0000

On Wed, Dec 16, 2015 at 01:26:11PM +0100, Morten Br?rup wrote:
> Bruce,
> 
> Please note that tcpdump is a stupid name for a packet capture application 
> that supports much more than just TCP.
> 
> I had missed the point about ethdev supporting virtual interfaces, so thank 
> you for pointing that out. That covers my concerns about capturing packets 
> inside tunnels.
> 
> I will gladly admit that you Intel guys are probably much more competent in 
> the field of DPDK performance and scalability than I am. So Matthew and I 
> have been asking you to kindly ensure that your solution scales well at very 
> high packet rates too, and pointing out that filtering before copying is 
> probably cheaper than copying before filtering. You mention that it leads to 
> an important choice about which lcores get to do the work of filtering the 
> packets, so that might be worth some discussion.
> 
> :-)
> 
> Med venlig hilsen / kind regards
> - Morten Br?rup
>


Thanks for your support.

We may look at having a certain amount of flexibility in the configuration of
the setup, so as to avoid limiting the use of the functionality.

For scalability at very high packet rates, it's something we'll need you guys to
give us pointers on too - what's acceptable or not inside an app, and what
level of scalabilty is needed. I'd admit that most of our initial thinking in 
this
area was for debugging apps at less than line rate i.e. for functional testing.
For full line rate introspection, we'll have to see when we get some working 
code.

/Bruce

> 
> -----Original Message-----
> From: Bruce Richardson [mailto:bruce.richardson at intel.com] 
> Sent: 16. december 2015 12:56
> To: Morten Br?rup
> Cc: Matthew Hall; Kyle Larose; dev at dpdk.org
> Subject: Re: [dpdk-dev] tcpdump support in DPDK 2.3
> 
> On Wed, Dec 16, 2015 at 12:40:43PM +0100, Morten Br?rup wrote:
> > Bruce,
> > 
> > This doesn't really sound like tcpdump to me; it sounds like port mirroring.
> 
> It's actually a bit of both, in my opinion, it's designed to allow basic 
> mirroring of traffic on a port to allow that traffic to be sent to a tcpdump 
> destination.
> By going with a more generic approach, we hope to enable more possible use 
> cases than just focusing on TCP.
> 
> 
> > 
> > Your suggestion is limited to physical ports only, and cannot be attached 
> > further inside the application, e.g. for mirroring packets related to a 
> > specific VLAN.
> 
> Yes, the lack of attachment inside the app is a limitation. There are two 
> types of scenarios that could be considered for packet capture:
> * ones where the application can be modified to do it's own filtering and 
> capturing.
> * ones where you want a generic capture mechanism which can be used on any 
> application without modification.
> We have chosen to focus more on the second one, as that is where a generic 
> solution for DPDK is likely to lie. For the first case, the application 
> writer himself knows the type of traffic and how best to capture and filter 
> it, so I don't think a generic one-size-fits-all solution is possible. 
> [Though a couple of helper libraries may be of use]
> 
> As for physical ports, the scheme should work for any ethdev - why do you see 
> it only being limited to physical ports? What would you want to see monitored 
> that we are missing.
> 
> > 
> > Furthermore, it doesn't sound like the filtering part scales well. Consider 
> > a fully loaded 40 Gbit/s port. You would need to copy all packets into a 
> > single rte_ring to the attached filtering process, which would then require 
> > its own set of lcores to probably discard most of these packets when 
> > filtering. I agree with Matthew that the filtering needs to happen as close 
> > to the source as possible, and must be scalable to multiple lcores.
> 
> Without modifying the application itself to do it's own filtering I suspect 
> scalability is always going to be a problem. That being said, there is no 
> particular reason why a single rte_ring needs to be used - we could allow one 
> ring per NIC queue for instance. The trouble with filtering at the source 
> itself is that you put extra load on the IO cores. By using a ring, we put 
> the filtering load on extra cores in a secondary process which can be scaled 
> by the user without touching the main app.
> 
> > 
> > On the positive side, your idea has the advantage that the filter can be 
> > any application, and is not limited to BPF. However if the purpose is 
> > "tcpdump", we should probably consider BPF, which is the type of filtering 
> > offered by tcpdump.
> 
> Having this work with any application is one of our primary targets here. The 
> app author should not have to worry too much about getting basic debug 
> support.
> Even if it doesn't work at 40G small packet rates, you can get a lot of 
> benefit from a scheme that provides functional debugging for an app. 
> Obviously, though we aim to make this as scalable as possible, which is why 
> we want to allow fitlering in userspace before sending packets externally to 
> DPDK.
> 
> > 
> > I would prefer having a BPF library available that the application can use 
> > at any point, either at the lowest level (when receiving/transmitting 
> > Ethernet packets) or at a higher level (e.g. when working with packets that 
> > go into or come out of a tunnel). The BPF library should implement packet 
> > length and relevant ancillary data, such as SKF_AD_VLAN_TAG etc. based on 
> > metadata in the mbuf.
> > 
> > Transferring a BPF filter from an outside application could be done by 
> > using a simple text format, e.g. the output format of "tcpdump -ddd". This 
> > also opens an easy roadmap for Wireshark integration by simply extending 
> > excap to include such a BPF filter format.
> > 
> > 
> > Lots of negativity above. I very much like the idea of attaching the 
> > secondary process and going through an rte_ring. This allows the secondary 
> > process to pass the filtered and captured packets on in any format it likes 
> > to any destination it likes.
> 
> Good, so we're not completely off-base here. :-)
> 
> /Bruce
> 
> > 
> > 
> > Med venlig hilsen / kind regards
> > - Morten Br?rup
> > 
> > -----Original Message-----
> > From: Bruce Richardson [mailto:bruce.richardson at intel.com]
> > Sent: 16. december 2015 11:45
> > 
> > Hi,
> > 
> > we are currently doing some investigation and prototyping for this feature.
> > Our current thinking is the following:
> > * to allow dynamic control of the filtering, we are thinking of making use 
> > of
> >   the multi-process infrastructure in DPDK. A secondary process can attach 
> > to a
> >   primary at runtime and provide the packet filtering and dumping 
> > capability.
> > * ideally we want to create a generic packet mirroring callback inside the 
> > EAL,
> >   that can be set up to mirror packets going through Rx/Tx on an ethdev.
> > * using this, packets being received on the port to be monitored are sent 
> > via
> >   an rte_ring (ring ethdev) to the secondary process which takes those 
> > packets
> >   and does any filtering on them. [This would be where BPF could fit into
> >   things, but it's not something we have looked at yet.]
> > * initially we plan to have the secondary process then write packets to a 
> > pcap
> >   file using a pcap PMD, but down the road if we get other PMDs, like a KNI 
> > PMD
> >   or a TAP device PMD, those could be used as targets instead.
> > 
> > This implementation we hope should provide enough hooks to enable the 
> > standard tools to be used for monitoring and capturing packets. We will 
> > send out draft implementation code for various parts of this as soon as we 
> > have it.
> > 
> > Additional feedback welcome, as always. :-)
> > 
> > Regards,
> > /Bruce
> > 
> > 
>

[dpdk-dev] tcpdump support in DPDK 2.3

Reply via email to