Re: [Ntop-misc] libzero performance

Alfredo Cardigliano Fri, 15 Mar 2013 03:46:51 -0700

Hi Martin
see inline

On Mar 15, 2013, at 11:10 AM, Martin Kummer <[email protected]> 
wrote:


> Hi Alfredo,
> 
> using hugepages improved the performance considerably. (85% to 90% e.g.) 
> Sending bigger (=fewer) packets also helped. What a surprise.
> 
> The machine has 16 GB DDR3-1333 memory. The fact that it's a 32-bit kernel 
> (don't ask me why) might be an issue…

A 64bit system is recommended.

> 
> I assume that with current hardware libzero can't handle much more than 
> 15Mpps at best. Is that correct?

It could be. 

> 
> One more question: What's for example the difference between "RSS=1,1,1,1" 
> and "RSS=1,1"?

It is a per-interface option: with two ports "RSS=1,1", with four ports 
"RSS=1,1,1,1" and so on.

Best Regards
Alfredo

> 
> Best Regards
> Martin
> 
> 
> On Thu, Mar 14, 2013 at 08:42:52PM +0100, Alfredo Cardigliano wrote:
>> Hi Martin and Craig
>> let me clarify a bit some points:
>> - min_num_slots and transparent_mode are kernel-level settings that apply to 
>> standard rings, not DNA/Libzero (kernel is bypassed in this case).
>> - RSS is useful for balancing traffic with DNA, but it has some limitations, 
>> for this reason we developed the Libzero DNA Cluster. The latter can use a 
>> custom distribution function, replacing the RSS with a flexible user-defined 
>> function: this means you should load the driver with RSS disabled 
>> (RSS=1,1,1,1).
>> 
>> This said, the load_dna_driver.sh script we provide with the drivers should 
>> be fine for Martin: RSS=1,1,1,1 will disable RSS (single queue), you can add 
>> num_rx_slots=32768 as suggested by Craig to set the number of NIC slots to 
>> the maximum.
>> As Martin said, the master is the real bottleneck as it is a centralisation 
>> point with a computationally-intensive task: it has to read from the NIC, 
>> parse, hash, and deliver to the slaves each packet, all this in a few clock 
>> cycles. This is also the reason of "The more slaves, the lower the RX value 
>> gets", as more slaves means more data structures in memory thus more stress 
>> on cache. In order to do this at 10G wire-rate you need a good machine, with 
>> a good cpu and a good memory hierarchy (actually your cpu looks fast enough, 
>> I can't comment about your memory).
>> Probably using hugepages can help you a bit: have a look at 
>> PF_RING/README.hugepages and the -u parameter of pfdnacluster_master.
>> 
>> Best Regards
>> Alfredo
>> 
>> On Mar 14, 2013, at 7:58 PM, Craig Merchant <[email protected]> wrote:
>> 
>>> From my understanding and experience, you don't use RSS with DNA/Libzero.  
>>> The RSS queues are limited to 16 queues.  
>>> 
>>> The major value of using DNA/Libzero is that it lets you use more queues 
>>> than RSS.
>>> 
>>> Try the settings below...  It's been a while since I set this up, but I 
>>> remember having some issues that required me to force pf_ring to load 
>>> before the ixgbe driver.  
>>> 
>>> options ixgbe MQ=0,0 num_rx_slots=32768 
>>> options pf_ring min_num_slots=65536 transparent_mode=1 
>>> install ixgbe /sbin/modprobe pf_ring $CMDLINE_OPTS; /sbin/modprobe 
>>> --ignore-install ixgbe $CMDLINE_OPTS
>>> 
>>> -----Original Message-----
>>> From: [email protected] 
>>> [mailto:[email protected]] On Behalf Of Martin Kummer
>>> Sent: Thursday, March 14, 2013 11:18 AM
>>> To: [email protected]
>>> Subject: Re: [Ntop-misc] libzero performance
>>> 
>>> Wow, thx for the quick answer.
>>> 
>>> I just use the provided script 
>>> (drivers/DNA/ixgbe-3.10.16-DNA/src/load_dna_driver.sh) to load the drivers. 
>>> In short it does:
>>> insmod ../../../../kernel/pf_ring.ko
>>> insmod ./ixgbe.ko RSS=1,1,1,1
>>> ifconfig dna1 up
>>> bash ../scripts/set_irq_affinity.sh ${IF[index]}
>>> 
>>> The sysadmin has forbidden me to install these drivers permanently so 
>>> there's nothing in /etc/modprobe/*.conf
>>> 
>>> Martin
>>> 
>>> 
>>> On Thu, Mar 14, 2013 at 05:44:48PM +0000, Craig Merchant wrote:
>>>> Martin,
>>>> 
>>>> I'm running pfdnacluster_master on an interface that averages between 3-10 
>>>> Gbps.  The traffic is copied to 28 queues (0-27).  The 28th queue contains 
>>>> a copy of all of the traffic.  I don't have any issues with packets being 
>>>> dropped.
>>>> 
>>>> How are you initializing the ixbe and pf_ring drivers in your 
>>>> /etc/modprobe.d/*.conf file?  Mine looks something like:
>>>> 
>>>> options igb RSS=8,8
>>>> options ixgbe MQ=0,0 num_rx_slots=32768 options pf_ring 
>>>> min_num_slots=65536 transparent_mode=1 install ixgbe /sbin/modprobe 
>>>> pf_ring $CMDLINE_OPTS; /sbin/modprobe --ignore-install ixgbe 
>>>> $CMDLINE_OPTS
>>>> 
>>>> How are you bringing up the interface?  I'm using DNA/Libzero for Snort, 
>>>> so I bring up the interface in the Snort init script with something like:
>>>> 
>>>> 
>>>> function adapter_settings() {
>>>>   ifconfig dna0 up promisc
>>>>   ethtool -K dna0 tso off &>/dev/null
>>>>   ethtool -K dna0 gro off &>/dev/null
>>>>   ethtool -K dna0 lro off &>/dev/null
>>>>   ethtool -K dna0 gso off &>/dev/null
>>>>   ethtool -G dna0 tx 32768 &>/dev/null
>>>>   ethtool -G dna0 rx 32768 &>/dev/null }
>>>> 
>>>> Thanks.
>>>> 
>>>> Craig
>>>> 
>>>> -----Original Message-----
>>>> From: [email protected] 
>>>> [mailto:[email protected]] On Behalf Of Martin 
>>>> Kummer
>>>> Sent: Thursday, March 14, 2013 10:24 AM
>>>> To: [email protected]
>>>> Subject: [Ntop-misc] libzero performance
>>>> 
>>>> Hi everyone.
>>>> 
>>>> For my bachelor thesis I'm modifying Vermont 
>>>> (https://github.com/constcast/vermont/wiki) to use PF_RING/libzero instead 
>>>> of pcap.
>>>> 
>>>> To test libzero I used one of the examples on the website: On one host 
>>>> I used pfdnacluster_master and pfcount. From another host I sent 100M 
>>>> packets at 10Gbps. (ca. 12Mpps)
>>>> 
>>>> There are two issues:
>>>> • When I use just one slave (pfdancluster_master [...] -n 1) the 
>>>> performance is very good. I get about 99% of the packets. When I split the 
>>>> data between two slaves that number drops to about 90%. The last line of 
>>>> "pfdnacluster_master -n 2" output:
>>>>   Absolute Stats: RX 90'276'085 pkts [2'314'438.07 pkt/sec] Processed 
>>>> 90'276'085 pkts [2'314'438.07 pkt/sec] The more slaves, the lower the RX 
>>>> value gets.
>>>> 
>>>> • Even with just one slave pfdnacluster_master uses almost 100% of a CPU 
>>>> core. While not yet a problem for me, this is likely to be the next 
>>>> bottleneck.
>>>> 
>>>> Is there a way to increase the performance of a dnacluster, when there are 
>>>> several slaves?
>>>> 
>>>> The software used:
>>>> - the current svn chockout of ntop
>>>> - the DNA ixgbe driver by ntop
>>>> 
>>>> The hardware used:
>>>> - Core i7-3930K (6C, HT, 3.2 GHz)
>>>> - Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ 
>>>> Network Connection (rev 01)
>>>>   Subsystem: Intel Corporation Ethernet Server Adapter X520-2
>>>> 
>>>> best regards,
>>>> Martin
>>>> _______________________________________________
>>>> Ntop-misc mailing list
>>>> [email protected]
>>>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>>>> _______________________________________________
>>>> Ntop-misc mailing list
>>>> [email protected]
>>>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>>> _______________________________________________
>>> Ntop-misc mailing list
>>> [email protected]
>>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>>> _______________________________________________
>>> Ntop-misc mailing list
>>> [email protected]
>>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>> 
>> _______________________________________________
>> Ntop-misc mailing list
>> [email protected]
>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
> _______________________________________________
> Ntop-misc mailing list
> [email protected]
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc

_______________________________________________
Ntop-misc mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

Re: [Ntop-misc] libzero performance

Reply via email to