Hi Alfredo,

using hugepages improved the performance considerably. (85% to 90% e.g.) 
Sending bigger (=fewer) packets also helped. What a surprise.

The machine has 16 GB DDR3-1333 memory. The fact that it's a 32-bit kernel 
(don't ask me why) might be an issue...

I assume that with current hardware libzero can't handle much more than 15Mpps 
at best. Is that correct?

One more question: What's for example the difference between "RSS=1,1,1,1" and 
"RSS=1,1"?

Best Regards
Martin


On Thu, Mar 14, 2013 at 08:42:52PM +0100, Alfredo Cardigliano wrote:
> Hi Martin and Craig
> let me clarify a bit some points:
> - min_num_slots and transparent_mode are kernel-level settings that apply to 
> standard rings, not DNA/Libzero (kernel is bypassed in this case).
> - RSS is useful for balancing traffic with DNA, but it has some limitations, 
> for this reason we developed the Libzero DNA Cluster. The latter can use a 
> custom distribution function, replacing the RSS with a flexible user-defined 
> function: this means you should load the driver with RSS disabled 
> (RSS=1,1,1,1).
> 
> This said, the load_dna_driver.sh script we provide with the drivers should 
> be fine for Martin: RSS=1,1,1,1 will disable RSS (single queue), you can add 
> num_rx_slots=32768 as suggested by Craig to set the number of NIC slots to 
> the maximum.
> As Martin said, the master is the real bottleneck as it is a centralisation 
> point with a computationally-intensive task: it has to read from the NIC, 
> parse, hash, and deliver to the slaves each packet, all this in a few clock 
> cycles. This is also the reason of "The more slaves, the lower the RX value 
> gets", as more slaves means more data structures in memory thus more stress 
> on cache. In order to do this at 10G wire-rate you need a good machine, with 
> a good cpu and a good memory hierarchy (actually your cpu looks fast enough, 
> I can't comment about your memory).
> Probably using hugepages can help you a bit: have a look at 
> PF_RING/README.hugepages and the -u parameter of pfdnacluster_master.
> 
> Best Regards
> Alfredo
> 
> On Mar 14, 2013, at 7:58 PM, Craig Merchant <[email protected]> wrote:
> 
> > From my understanding and experience, you don't use RSS with DNA/Libzero.  
> > The RSS queues are limited to 16 queues.  
> > 
> > The major value of using DNA/Libzero is that it lets you use more queues 
> > than RSS.
> > 
> > Try the settings below...  It's been a while since I set this up, but I 
> > remember having some issues that required me to force pf_ring to load 
> > before the ixgbe driver.  
> > 
> > options ixgbe MQ=0,0 num_rx_slots=32768 
> > options pf_ring min_num_slots=65536 transparent_mode=1 
> > install ixgbe /sbin/modprobe pf_ring $CMDLINE_OPTS; /sbin/modprobe 
> > --ignore-install ixgbe $CMDLINE_OPTS
> > 
> > -----Original Message-----
> > From: [email protected] 
> > [mailto:[email protected]] On Behalf Of Martin Kummer
> > Sent: Thursday, March 14, 2013 11:18 AM
> > To: [email protected]
> > Subject: Re: [Ntop-misc] libzero performance
> > 
> > Wow, thx for the quick answer.
> > 
> > I just use the provided script 
> > (drivers/DNA/ixgbe-3.10.16-DNA/src/load_dna_driver.sh) to load the drivers. 
> > In short it does:
> > insmod ../../../../kernel/pf_ring.ko
> > insmod ./ixgbe.ko RSS=1,1,1,1
> > ifconfig dna1 up
> > bash ../scripts/set_irq_affinity.sh ${IF[index]}
> > 
> > The sysadmin has forbidden me to install these drivers permanently so 
> > there's nothing in /etc/modprobe/*.conf
> > 
> > Martin
> > 
> > 
> > On Thu, Mar 14, 2013 at 05:44:48PM +0000, Craig Merchant wrote:
> >> Martin,
> >> 
> >> I'm running pfdnacluster_master on an interface that averages between 3-10 
> >> Gbps.  The traffic is copied to 28 queues (0-27).  The 28th queue contains 
> >> a copy of all of the traffic.  I don't have any issues with packets being 
> >> dropped.
> >> 
> >> How are you initializing the ixbe and pf_ring drivers in your 
> >> /etc/modprobe.d/*.conf file?  Mine looks something like:
> >> 
> >> options igb RSS=8,8
> >> options ixgbe MQ=0,0 num_rx_slots=32768 options pf_ring 
> >> min_num_slots=65536 transparent_mode=1 install ixgbe /sbin/modprobe 
> >> pf_ring $CMDLINE_OPTS; /sbin/modprobe --ignore-install ixgbe 
> >> $CMDLINE_OPTS
> >> 
> >> How are you bringing up the interface?  I'm using DNA/Libzero for Snort, 
> >> so I bring up the interface in the Snort init script with something like:
> >> 
> >> 
> >> function adapter_settings() {
> >>    ifconfig dna0 up promisc
> >>    ethtool -K dna0 tso off &>/dev/null
> >>    ethtool -K dna0 gro off &>/dev/null
> >>    ethtool -K dna0 lro off &>/dev/null
> >>    ethtool -K dna0 gso off &>/dev/null
> >>    ethtool -G dna0 tx 32768 &>/dev/null
> >>    ethtool -G dna0 rx 32768 &>/dev/null }
> >> 
> >> Thanks.
> >> 
> >> Craig
> >> 
> >> -----Original Message-----
> >> From: [email protected] 
> >> [mailto:[email protected]] On Behalf Of Martin 
> >> Kummer
> >> Sent: Thursday, March 14, 2013 10:24 AM
> >> To: [email protected]
> >> Subject: [Ntop-misc] libzero performance
> >> 
> >> Hi everyone.
> >> 
> >> For my bachelor thesis I'm modifying Vermont 
> >> (https://github.com/constcast/vermont/wiki) to use PF_RING/libzero instead 
> >> of pcap.
> >> 
> >> To test libzero I used one of the examples on the website: On one host 
> >> I used pfdnacluster_master and pfcount. From another host I sent 100M 
> >> packets at 10Gbps. (ca. 12Mpps)
> >> 
> >> There are two issues:
> >> • When I use just one slave (pfdancluster_master [...] -n 1) the 
> >> performance is very good. I get about 99% of the packets. When I split the 
> >> data between two slaves that number drops to about 90%. The last line of 
> >> "pfdnacluster_master -n 2" output:
> >>    Absolute Stats: RX 90'276'085 pkts [2'314'438.07 pkt/sec] Processed 
> >> 90'276'085 pkts [2'314'438.07 pkt/sec] The more slaves, the lower the RX 
> >> value gets.
> >> 
> >> • Even with just one slave pfdnacluster_master uses almost 100% of a CPU 
> >> core. While not yet a problem for me, this is likely to be the next 
> >> bottleneck.
> >> 
> >> Is there a way to increase the performance of a dnacluster, when there are 
> >> several slaves?
> >> 
> >> The software used:
> >> - the current svn chockout of ntop
> >> - the DNA ixgbe driver by ntop
> >> 
> >> The hardware used:
> >> - Core i7-3930K (6C, HT, 3.2 GHz)
> >> - Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ 
> >> Network Connection (rev 01)
> >>    Subsystem: Intel Corporation Ethernet Server Adapter X520-2
> >> 
> >> best regards,
> >> Martin
> >> _______________________________________________
> >> Ntop-misc mailing list
> >> [email protected]
> >> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
> >> _______________________________________________
> >> Ntop-misc mailing list
> >> [email protected]
> >> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
> > _______________________________________________
> > Ntop-misc mailing list
> > [email protected]
> > http://listgateway.unipi.it/mailman/listinfo/ntop-misc
> > _______________________________________________
> > Ntop-misc mailing list
> > [email protected]
> > http://listgateway.unipi.it/mailman/listinfo/ntop-misc
> 
> _______________________________________________
> Ntop-misc mailing list
> [email protected]
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
_______________________________________________
Ntop-misc mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

Reply via email to