Luca,

I think we may be in agreement. Zero-copy means that sniffing one 10GigE
link is well below any limit in memory bus bandwidth, such that it simply
won't be the bottleneck in any real world application. But suppose the
application is trying to sniff four saturated 10GigE links, so the input
network bandwidth is 40Gbps. This is 5 GBps of data to process, and it will
require at least that much memory bandwidth. High end servers might have
memory bus architectures capable of that, but there would be no memory
bandwidth left over for the application do any real packet analysis.

Theoretically one could imagine a system architecture where the NIC has its
own dedicated ring buffer and that memory is exposed directly to the
application. That would eliminate one packet copy that currently is
performed (even with libzero) to copy the packet from the NIC's memory to
the server's primary memory. That would mean that the only memory bandwidth
consumed would be bandwidth needed to analyze the packets.

Again, am I making any gross errors here?

Sorry if this seems either obvious or pedantic, I just want to make sure
that I am reasoning correctly about the total memory bus bandwidth required
by our application.

Thanks,
Jim


On Fri, Sep 14, 2012 at 10:30 AM, Luca Deri <[email protected]> wrote:

> Jim,
> I am not sure I can follow you 100%.
>
> All I say is that with PF_RING "vanilla" there is no zero-copy, we have to
> deal with the driver and the kernel and thus the performance is not line
> rate.
>
> With DNA, the application accesses the packet directly in memory with no
> need to copy. As we do prefetching in DNA, basically you have the packet in
> memory ready to go. And this is the whole packet, not just the header. For
> instance we have developed a packet-to-disk application that captures and
> writes to disk a 10G stream with no loss. Writing entire packets to disk
> means that you stress a lot the memory  (and the disk_ similar as switching
> packet across interfaces, again at line rate.
>
> So perhaps we can't offer you a free lunch, but zero-copy allows you to
> handle 10G for sure on a modern machine with a decent memory bandwidth
> (e.g. sandy bridge or better)
>
> Cheers Luca
>
>
> On Sep 14, 2012, at 7:17 PM, Jim Lloyd <[email protected]> wrote:
>
> > Can you please confirm my understanding of how memory bandwidth affects
> the ability of PF_RING to do full packet capture at wire speeds?
> >
> > Assume we want to capture data from a saturated 10GigE link. That
> corresponds to 10 billion bits/sec, or about 1.25 billion bytes/sec.
> >
> > Packets arrive in the NIC and must be copied to the ring buffer, so the
> system will consume 1.25GBps of memory bandwidth, even if libzero is used.
> Correct? Even with DMA, memory bandwidth is consumed.
> >
> > If the PF_RING enabled libpcap is used, there will be one extra copy per
> packet to deliver packet to the userland application, requiring 2.5GBps of
> memory bandwidth. Correct?
> >
> > If we ignore L1/L2/L3 caches, current servers will typically have more
> than 2.5Gbps of memory bandwidth available, but in practice, actual
> bandwidth is limited at about that amount. So, eliminating that extra
> packet copy is indeed the difference between consuming about 50% of the
> memory bandwidth vs. consuming about 100% of the memory bandwidth. But
> libzero won't drop the memory bandwidth consumed by packet capture down to
> 0%. There is no free lunch.
> >
> > Have I made any gross errors here?
> >
> > Thanks.
> >
> > _______________________________________________
> > Ntop-misc mailing list
> > [email protected]
> > http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>
> _______________________________________________
> Ntop-misc mailing list
> [email protected]
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>
_______________________________________________
Ntop-misc mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

Reply via email to