Bruce,

Please note that tcpdump is a stupid name for a packet capture application that 
supports much more than just TCP.

I had missed the point about ethdev supporting virtual interfaces, so thank you 
for pointing that out. That covers my concerns about capturing packets inside 
tunnels.

I will gladly admit that you Intel guys are probably much more competent in the 
field of DPDK performance and scalability than I am. So Matthew and I have been 
asking you to kindly ensure that your solution scales well at very high packet 
rates too, and pointing out that filtering before copying is probably cheaper 
than copying before filtering. You mention that it leads to an important choice 
about which lcores get to do the work of filtering the packets, so that might 
be worth some discussion.

:-)

Med venlig hilsen / kind regards
- Morten Br?rup


-----Original Message-----
From: Bruce Richardson [mailto:bruce.richard...@intel.com] 
Sent: 16. december 2015 12:56
To: Morten Br?rup
Cc: Matthew Hall; Kyle Larose; dev at dpdk.org
Subject: Re: [dpdk-dev] tcpdump support in DPDK 2.3

On Wed, Dec 16, 2015 at 12:40:43PM +0100, Morten Br?rup wrote:
> Bruce,
> 
> This doesn't really sound like tcpdump to me; it sounds like port mirroring.

It's actually a bit of both, in my opinion, it's designed to allow basic 
mirroring of traffic on a port to allow that traffic to be sent to a tcpdump 
destination.
By going with a more generic approach, we hope to enable more possible use 
cases than just focusing on TCP.


> 
> Your suggestion is limited to physical ports only, and cannot be attached 
> further inside the application, e.g. for mirroring packets related to a 
> specific VLAN.

Yes, the lack of attachment inside the app is a limitation. There are two types 
of scenarios that could be considered for packet capture:
* ones where the application can be modified to do it's own filtering and 
capturing.
* ones where you want a generic capture mechanism which can be used on any 
application without modification.
We have chosen to focus more on the second one, as that is where a generic 
solution for DPDK is likely to lie. For the first case, the application writer 
himself knows the type of traffic and how best to capture and filter it, so I 
don't think a generic one-size-fits-all solution is possible. [Though a couple 
of helper libraries may be of use]

As for physical ports, the scheme should work for any ethdev - why do you see 
it only being limited to physical ports? What would you want to see monitored 
that we are missing.

> 
> Furthermore, it doesn't sound like the filtering part scales well. Consider a 
> fully loaded 40 Gbit/s port. You would need to copy all packets into a single 
> rte_ring to the attached filtering process, which would then require its own 
> set of lcores to probably discard most of these packets when filtering. I 
> agree with Matthew that the filtering needs to happen as close to the source 
> as possible, and must be scalable to multiple lcores.

Without modifying the application itself to do it's own filtering I suspect 
scalability is always going to be a problem. That being said, there is no 
particular reason why a single rte_ring needs to be used - we could allow one 
ring per NIC queue for instance. The trouble with filtering at the source 
itself is that you put extra load on the IO cores. By using a ring, we put the 
filtering load on extra cores in a secondary process which can be scaled by the 
user without touching the main app.

> 
> On the positive side, your idea has the advantage that the filter can be any 
> application, and is not limited to BPF. However if the purpose is "tcpdump", 
> we should probably consider BPF, which is the type of filtering offered by 
> tcpdump.

Having this work with any application is one of our primary targets here. The 
app author should not have to worry too much about getting basic debug support.
Even if it doesn't work at 40G small packet rates, you can get a lot of benefit 
from a scheme that provides functional debugging for an app. Obviously, though 
we aim to make this as scalable as possible, which is why we want to allow 
fitlering in userspace before sending packets externally to DPDK.

> 
> I would prefer having a BPF library available that the application can use at 
> any point, either at the lowest level (when receiving/transmitting Ethernet 
> packets) or at a higher level (e.g. when working with packets that go into or 
> come out of a tunnel). The BPF library should implement packet length and 
> relevant ancillary data, such as SKF_AD_VLAN_TAG etc. based on metadata in 
> the mbuf.
> 
> Transferring a BPF filter from an outside application could be done by using 
> a simple text format, e.g. the output format of "tcpdump -ddd". This also 
> opens an easy roadmap for Wireshark integration by simply extending excap to 
> include such a BPF filter format.
> 
> 
> Lots of negativity above. I very much like the idea of attaching the 
> secondary process and going through an rte_ring. This allows the secondary 
> process to pass the filtered and captured packets on in any format it likes 
> to any destination it likes.

Good, so we're not completely off-base here. :-)

/Bruce

> 
> 
> Med venlig hilsen / kind regards
> - Morten Br?rup
> 
> -----Original Message-----
> From: Bruce Richardson [mailto:bruce.richardson at intel.com]
> Sent: 16. december 2015 11:45
> 
> Hi,
> 
> we are currently doing some investigation and prototyping for this feature.
> Our current thinking is the following:
> * to allow dynamic control of the filtering, we are thinking of making use of
>   the multi-process infrastructure in DPDK. A secondary process can attach to 
> a
>   primary at runtime and provide the packet filtering and dumping capability.
> * ideally we want to create a generic packet mirroring callback inside the 
> EAL,
>   that can be set up to mirror packets going through Rx/Tx on an ethdev.
> * using this, packets being received on the port to be monitored are sent via
>   an rte_ring (ring ethdev) to the secondary process which takes those packets
>   and does any filtering on them. [This would be where BPF could fit into
>   things, but it's not something we have looked at yet.]
> * initially we plan to have the secondary process then write packets to a pcap
>   file using a pcap PMD, but down the road if we get other PMDs, like a KNI 
> PMD
>   or a TAP device PMD, those could be used as targets instead.
> 
> This implementation we hope should provide enough hooks to enable the 
> standard tools to be used for monitoring and capturing packets. We will send 
> out draft implementation code for various parts of this as soon as we have it.
> 
> Additional feedback welcome, as always. :-)
> 
> Regards,
> /Bruce
> 
> 

Reply via email to