Re: [Ntop-misc] FW: [ARGUS] Direction and IP/TCP timeout settings

Alfredo Cardigliano Thu, 18 Jul 2013 00:45:05 -0700

Hi Craig
what do you mean with "Pfcount says that the queue that argus is running  on is 
only dropping 0.1% of packets"? You should look at the stats on the queue argus 
is using.
Select/poll are not supported by the cluster as we experienced that using 
usleep behaves better than the poll implementation in this case.


Alfredo

On Jul 16, 2013, at 1:51 AM, Craig Merchant <[email protected]> wrote:

> I’m trying to troubleshoot some issues with the argus netflow tool running on 
> top of pfdnacluster_master.  Pfcount says that the queue that argus is 
> running  on is only dropping 0.1% of packets, yet argus can’t figure out the 
> direction of about 60% of the flows.  That means for some reason it isn’t 
> seeing the SYN and SYNACK of a lot of flows.
>  
> The argus developer had a couple questions about the pfdnacluster_master that 
> I can’t answer…  They are below.
>  
> Thanks.
> 
> Craig
>  
> From: Carter Bullard [mailto:[email protected]] 
> Sent: Monday, July 15, 2013 3:13 PM
> To: Craig Merchant
> Cc: Argus ([email protected])
> Subject: Re: [ARGUS] Direction and IP/TCP timeout settings
>  
> Hey Craig,
> If radium doesn't keep, the argi will drop the connections,
> so unless you see radium losing its connection and 
> then re-establishing, I don't think its radium.  We can measure
> all of this, so its not going to be hard to track down, I don't
> think.
>  
> If argus is generating the same number of flows, then its probably
> seeing the same traffic.  So, it seems that we are not getting all
> the packets, and it doesn't appear to be due to argus running
> out of cycles.  Are we running out of memory? How does vmstat look
> on the machine ??  Not swapping out ?
>  
> To understand this issue, I need to know if the pfdnacluster_master queue
> is a selectable packet source, or not.  We want to use select() to get
> packets, so that we can leverage the select()s timeout feature to wake
> us up, periodically, so we can do some background maintenance, like queue
> timeouts, etc…
>  
> When we can't select(), we have to poll the interface, and if
> there isn't anything there, we could fall into a nanosleep() call,
> waiting for packets.  That may be a very bad thing, causing us to
> could be lose packets.
>  
> Does the pfdnacluster_master queue provide standard pcap_stats() ?
> We should be able to look at the MARs, which will tell us  how
> many packets the interface dropped.
>  
> Not sure that I understand the problem with multiple argus processes?
> You can run 24 copies of argus, and have radium connect to them
> all to recreate the single argus data stream, if that is something
> you would like to do.
>  
> Lets focus on this new interface.  It could be we have to do something
> special to get the best performance out of it.
>  
> Carter
>  
> 
>  
> On Jul 15, 2013, at 5:34 PM, Craig Merchant <[email protected]> wrote:
> 
> 
> The DNA/libzero drivers only allow a single process to connect to the 
> “queues” that the pfdnacluster_master app presents.  The default version of 
> their app will allow you to copy the same flow to multiple queues, but then 
> we’d need to run 28 snort instances and 28 argus instances.  From my 
> experience, Argus wasn’t burning that much CPU, so I opted to take advantage 
> of the work Chris Wakelin did in modifying pfdnacluster_master so that it 
> created a single queue with a copy of all the traffic.
>  
> Here’s the weird thing...  When argus is listening to the dna0 interface 
> directly, it’s CPU probably runs at 30-40%.  But when I run it on the 
> pfdnacluster_master queue, the CPU probably runs at about half that.
>  
> Yet when I look at the count of flow records for running Argus on the DNA 
> interface vs the pfdnacluster_master queue, the volume of records is about 
> the same.  It’s tough to test though because our traffic volume is pretty 
> variable depending on when customers launch their campaigns.  The only way to 
> test it for sure would be to wire the second 10g interface into the Gigamon 
> tap, send a copy of the traffic there, and then run one instance of argus on 
> the interface and one on pfdnacluster_master and compare them.
>  
> Is it possible that radium is getting overwhelmed?  The two argi that it 
> connects to probably do an aggregate volume of 5-15 Gbps…  Since there is a 
> fair bit of traffic between data centers, the dedup features of radium are 
> helpful.  If so, how do I troubleshoot that?
>  
> I might be able to put a copy of the non-pf_ring ixgbe driver on the sensor 
> and see how that impacts things.
>  
> Thanks for all your help!
>  
> Craig

_______________________________________________
Ntop-misc mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

Re: [Ntop-misc] FW: [ARGUS] Direction and IP/TCP timeout settings

Reply via email to