Hi Craig what do you mean with "Pfcount says that the queue that argus is running on is only dropping 0.1% of packets"? You should look at the stats on the queue argus is using. Select/poll are not supported by the cluster as we experienced that using usleep behaves better than the poll implementation in this case.
Alfredo On Jul 16, 2013, at 1:51 AM, Craig Merchant <[email protected]> wrote: > I’m trying to troubleshoot some issues with the argus netflow tool running on > top of pfdnacluster_master. Pfcount says that the queue that argus is > running on is only dropping 0.1% of packets, yet argus can’t figure out the > direction of about 60% of the flows. That means for some reason it isn’t > seeing the SYN and SYNACK of a lot of flows. > > The argus developer had a couple questions about the pfdnacluster_master that > I can’t answer… They are below. > > Thanks. > > Craig > > From: Carter Bullard [mailto:[email protected]] > Sent: Monday, July 15, 2013 3:13 PM > To: Craig Merchant > Cc: Argus ([email protected]) > Subject: Re: [ARGUS] Direction and IP/TCP timeout settings > > Hey Craig, > If radium doesn't keep, the argi will drop the connections, > so unless you see radium losing its connection and > then re-establishing, I don't think its radium. We can measure > all of this, so its not going to be hard to track down, I don't > think. > > If argus is generating the same number of flows, then its probably > seeing the same traffic. So, it seems that we are not getting all > the packets, and it doesn't appear to be due to argus running > out of cycles. Are we running out of memory? How does vmstat look > on the machine ?? Not swapping out ? > > To understand this issue, I need to know if the pfdnacluster_master queue > is a selectable packet source, or not. We want to use select() to get > packets, so that we can leverage the select()s timeout feature to wake > us up, periodically, so we can do some background maintenance, like queue > timeouts, etc… > > When we can't select(), we have to poll the interface, and if > there isn't anything there, we could fall into a nanosleep() call, > waiting for packets. That may be a very bad thing, causing us to > could be lose packets. > > Does the pfdnacluster_master queue provide standard pcap_stats() ? > We should be able to look at the MARs, which will tell us how > many packets the interface dropped. > > Not sure that I understand the problem with multiple argus processes? > You can run 24 copies of argus, and have radium connect to them > all to recreate the single argus data stream, if that is something > you would like to do. > > Lets focus on this new interface. It could be we have to do something > special to get the best performance out of it. > > Carter > > > > On Jul 15, 2013, at 5:34 PM, Craig Merchant <[email protected]> wrote: > > > The DNA/libzero drivers only allow a single process to connect to the > “queues” that the pfdnacluster_master app presents. The default version of > their app will allow you to copy the same flow to multiple queues, but then > we’d need to run 28 snort instances and 28 argus instances. From my > experience, Argus wasn’t burning that much CPU, so I opted to take advantage > of the work Chris Wakelin did in modifying pfdnacluster_master so that it > created a single queue with a copy of all the traffic. > > Here’s the weird thing... When argus is listening to the dna0 interface > directly, it’s CPU probably runs at 30-40%. But when I run it on the > pfdnacluster_master queue, the CPU probably runs at about half that. > > Yet when I look at the count of flow records for running Argus on the DNA > interface vs the pfdnacluster_master queue, the volume of records is about > the same. It’s tough to test though because our traffic volume is pretty > variable depending on when customers launch their campaigns. The only way to > test it for sure would be to wire the second 10g interface into the Gigamon > tap, send a copy of the traffic there, and then run one instance of argus on > the interface and one on pfdnacluster_master and compare them. > > Is it possible that radium is getting overwhelmed? The two argi that it > connects to probably do an aggregate volume of 5-15 Gbps… Since there is a > fair bit of traffic between data centers, the dedup features of radium are > helpful. If so, how do I troubleshoot that? > > I might be able to put a copy of the non-pf_ring ixgbe driver on the sensor > and see how that impacts things. > > Thanks for all your help! > > Craig
_______________________________________________ Ntop-misc mailing list [email protected] http://listgateway.unipi.it/mailman/listinfo/ntop-misc
