Hi Martin see inline On Mar 15, 2013, at 11:10 AM, Martin Kummer <[email protected]> wrote:
> Hi Alfredo, > > using hugepages improved the performance considerably. (85% to 90% e.g.) > Sending bigger (=fewer) packets also helped. What a surprise. > > The machine has 16 GB DDR3-1333 memory. The fact that it's a 32-bit kernel > (don't ask me why) might be an issue… A 64bit system is recommended. > > I assume that with current hardware libzero can't handle much more than > 15Mpps at best. Is that correct? It could be. > > One more question: What's for example the difference between "RSS=1,1,1,1" > and "RSS=1,1"? It is a per-interface option: with two ports "RSS=1,1", with four ports "RSS=1,1,1,1" and so on. Best Regards Alfredo > > Best Regards > Martin > > > On Thu, Mar 14, 2013 at 08:42:52PM +0100, Alfredo Cardigliano wrote: >> Hi Martin and Craig >> let me clarify a bit some points: >> - min_num_slots and transparent_mode are kernel-level settings that apply to >> standard rings, not DNA/Libzero (kernel is bypassed in this case). >> - RSS is useful for balancing traffic with DNA, but it has some limitations, >> for this reason we developed the Libzero DNA Cluster. The latter can use a >> custom distribution function, replacing the RSS with a flexible user-defined >> function: this means you should load the driver with RSS disabled >> (RSS=1,1,1,1). >> >> This said, the load_dna_driver.sh script we provide with the drivers should >> be fine for Martin: RSS=1,1,1,1 will disable RSS (single queue), you can add >> num_rx_slots=32768 as suggested by Craig to set the number of NIC slots to >> the maximum. >> As Martin said, the master is the real bottleneck as it is a centralisation >> point with a computationally-intensive task: it has to read from the NIC, >> parse, hash, and deliver to the slaves each packet, all this in a few clock >> cycles. This is also the reason of "The more slaves, the lower the RX value >> gets", as more slaves means more data structures in memory thus more stress >> on cache. In order to do this at 10G wire-rate you need a good machine, with >> a good cpu and a good memory hierarchy (actually your cpu looks fast enough, >> I can't comment about your memory). >> Probably using hugepages can help you a bit: have a look at >> PF_RING/README.hugepages and the -u parameter of pfdnacluster_master. >> >> Best Regards >> Alfredo >> >> On Mar 14, 2013, at 7:58 PM, Craig Merchant <[email protected]> wrote: >> >>> From my understanding and experience, you don't use RSS with DNA/Libzero. >>> The RSS queues are limited to 16 queues. >>> >>> The major value of using DNA/Libzero is that it lets you use more queues >>> than RSS. >>> >>> Try the settings below... It's been a while since I set this up, but I >>> remember having some issues that required me to force pf_ring to load >>> before the ixgbe driver. >>> >>> options ixgbe MQ=0,0 num_rx_slots=32768 >>> options pf_ring min_num_slots=65536 transparent_mode=1 >>> install ixgbe /sbin/modprobe pf_ring $CMDLINE_OPTS; /sbin/modprobe >>> --ignore-install ixgbe $CMDLINE_OPTS >>> >>> -----Original Message----- >>> From: [email protected] >>> [mailto:[email protected]] On Behalf Of Martin Kummer >>> Sent: Thursday, March 14, 2013 11:18 AM >>> To: [email protected] >>> Subject: Re: [Ntop-misc] libzero performance >>> >>> Wow, thx for the quick answer. >>> >>> I just use the provided script >>> (drivers/DNA/ixgbe-3.10.16-DNA/src/load_dna_driver.sh) to load the drivers. >>> In short it does: >>> insmod ../../../../kernel/pf_ring.ko >>> insmod ./ixgbe.ko RSS=1,1,1,1 >>> ifconfig dna1 up >>> bash ../scripts/set_irq_affinity.sh ${IF[index]} >>> >>> The sysadmin has forbidden me to install these drivers permanently so >>> there's nothing in /etc/modprobe/*.conf >>> >>> Martin >>> >>> >>> On Thu, Mar 14, 2013 at 05:44:48PM +0000, Craig Merchant wrote: >>>> Martin, >>>> >>>> I'm running pfdnacluster_master on an interface that averages between 3-10 >>>> Gbps. The traffic is copied to 28 queues (0-27). The 28th queue contains >>>> a copy of all of the traffic. I don't have any issues with packets being >>>> dropped. >>>> >>>> How are you initializing the ixbe and pf_ring drivers in your >>>> /etc/modprobe.d/*.conf file? Mine looks something like: >>>> >>>> options igb RSS=8,8 >>>> options ixgbe MQ=0,0 num_rx_slots=32768 options pf_ring >>>> min_num_slots=65536 transparent_mode=1 install ixgbe /sbin/modprobe >>>> pf_ring $CMDLINE_OPTS; /sbin/modprobe --ignore-install ixgbe >>>> $CMDLINE_OPTS >>>> >>>> How are you bringing up the interface? I'm using DNA/Libzero for Snort, >>>> so I bring up the interface in the Snort init script with something like: >>>> >>>> >>>> function adapter_settings() { >>>> ifconfig dna0 up promisc >>>> ethtool -K dna0 tso off &>/dev/null >>>> ethtool -K dna0 gro off &>/dev/null >>>> ethtool -K dna0 lro off &>/dev/null >>>> ethtool -K dna0 gso off &>/dev/null >>>> ethtool -G dna0 tx 32768 &>/dev/null >>>> ethtool -G dna0 rx 32768 &>/dev/null } >>>> >>>> Thanks. >>>> >>>> Craig >>>> >>>> -----Original Message----- >>>> From: [email protected] >>>> [mailto:[email protected]] On Behalf Of Martin >>>> Kummer >>>> Sent: Thursday, March 14, 2013 10:24 AM >>>> To: [email protected] >>>> Subject: [Ntop-misc] libzero performance >>>> >>>> Hi everyone. >>>> >>>> For my bachelor thesis I'm modifying Vermont >>>> (https://github.com/constcast/vermont/wiki) to use PF_RING/libzero instead >>>> of pcap. >>>> >>>> To test libzero I used one of the examples on the website: On one host >>>> I used pfdnacluster_master and pfcount. From another host I sent 100M >>>> packets at 10Gbps. (ca. 12Mpps) >>>> >>>> There are two issues: >>>> • When I use just one slave (pfdancluster_master [...] -n 1) the >>>> performance is very good. I get about 99% of the packets. When I split the >>>> data between two slaves that number drops to about 90%. The last line of >>>> "pfdnacluster_master -n 2" output: >>>> Absolute Stats: RX 90'276'085 pkts [2'314'438.07 pkt/sec] Processed >>>> 90'276'085 pkts [2'314'438.07 pkt/sec] The more slaves, the lower the RX >>>> value gets. >>>> >>>> • Even with just one slave pfdnacluster_master uses almost 100% of a CPU >>>> core. While not yet a problem for me, this is likely to be the next >>>> bottleneck. >>>> >>>> Is there a way to increase the performance of a dnacluster, when there are >>>> several slaves? >>>> >>>> The software used: >>>> - the current svn chockout of ntop >>>> - the DNA ixgbe driver by ntop >>>> >>>> The hardware used: >>>> - Core i7-3930K (6C, HT, 3.2 GHz) >>>> - Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ >>>> Network Connection (rev 01) >>>> Subsystem: Intel Corporation Ethernet Server Adapter X520-2 >>>> >>>> best regards, >>>> Martin >>>> _______________________________________________ >>>> Ntop-misc mailing list >>>> [email protected] >>>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc >>>> _______________________________________________ >>>> Ntop-misc mailing list >>>> [email protected] >>>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc >>> _______________________________________________ >>> Ntop-misc mailing list >>> [email protected] >>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc >>> _______________________________________________ >>> Ntop-misc mailing list >>> [email protected] >>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc >> >> _______________________________________________ >> Ntop-misc mailing list >> [email protected] >> http://listgateway.unipi.it/mailman/listinfo/ntop-misc > _______________________________________________ > Ntop-misc mailing list > [email protected] > http://listgateway.unipi.it/mailman/listinfo/ntop-misc _______________________________________________ Ntop-misc mailing list [email protected] http://listgateway.unipi.it/mailman/listinfo/ntop-misc
