Hi Vaggelis, I fully agree SNMP should be taken as ground truth; I have a couple of suggestions how to start troubleshooting this.
Posed I'm no expert of RouterOS; if it has a NetFlow export process, can you check if it pegs at 100% CPU? Or if anything suspicious emerges from the router logs? On the nfacctd side, if logs are clean then it should mean internal buffering is OK. Still, better to double-check buffering between the kernel and nfacctd. At this propo, can you please follow notes in section D of chapter XXI of a recent pmacct QUICKSTART guide ( see https://github.com/paololucente/pmacct/blob/master/pmacct/QUICKSTART ), essentially to check if there is any UDP drops? Finally, i see sql_refresh_time and sql_history are set to different values - meaning SQL UPDATE queries are involved; this is OK as long as the actual database does not suffer from them; can you check that SQL writer processes are not piling up? This can be done with a simple "ps auxw | grep nfacctd". Cheers, Paolo On Fri, Nov 27, 2015 at 04:48:48PM +0200, Vaggelis Koutroumpas wrote: > Hello, > > I am using nfacct with Mikrotik RouterOS to account for the traffic our > clients do each month. > I do aggregation per IP to get the total bytes for each IP for all our > prefixes. > > nfacct seems to be working fine with Mikrotik (it receives the flows > without any errors when running in debug mode). > The problem I encounter is that there are significant discrepancies > between what nfacct counts and what other tools count. > > I compare the nfacct results with Solarwinds (netflow) and Observium (SNMP). > I understand that SNMP will show different numbers since it counts the > switch ports octets including the ethernet overhead data etc (I've > included a 26bytes adjb on my nfacct config though to account for that > as per pmacct FAQ). > But even between 2 netflow collectors the data are different. > > Actually even between 2 different databases of nfacct data (using the > same nfacct instance) the data are not consistent. > > For example for today (27-11-2015) until the time of this writing, all 4 > implementations have different values. > > ------- > Observium/SNMP: > Total IN: 69.12GB > Total OUT: 318.22GB > > Solarwinds/Netflow: > Total IN: 60.4GB > Total OUT: 315GB > > nfacct (history 1d, refresh 60): > Total IN: 69.20GB > Total OUT: 302.74GB > > nfacct (history 5m, refresh 60): > Total IN: 68.44GB > Total OUT: 300.04GB > ------- > > The above (nfacct) numbers where calculated using standard SQL queries > such as: > SELECT ( > SELECT concat(truncate((sum(bytes)/1024/1024/1024),2), 'GB') as bytes > FROM netflow > WHERE ip_dst = '0.0.0.0' AND stamp_inserted = '2015-11-27 00:00:00' > ) as total_out, ( > SELECT concat(truncate((sum(bytes)/1024/1024/1024),2), 'GB') as bytes > FROM netflow > WHERE ip_src = '0.0.0.0' AND stamp_inserted = '2015-11-27 00:00:00' > ) as total_in > > So which of the above are the "correct" values? > Since our datacenter charges us based on their SNMP counters on our > uplink ports, and since we have crosschecked their measurements with > ours (observium) and are the exact same, I take the SNMP/Observium > results as my comparison baseline. > > I've been beating myself for the last 2 weeks trying to figure out > what's causing those skewed numbers. > On my lab where the traffic is controlled during tests I can do file > transfers and account every last byte without any discrepancies. > But when running the same config on the production site, I never get > consistent data (but there is also way more traffic and more IPs > generating that traffic) > > > Here is my nfacct config: > > ------ > daemonize: true > pidfile: /var/run/nfacctd.pid > sql_db: pmacct > sql_host: localhost > sql_user: ***** > sql_passwd: ***** > nfacctd_port: 2055 > > plugin_pipe_size: 16384000 > plugin_buffer_size: 16384 > > # 5min time-bins > aggregate[total_in]: dst_host > aggregate[total_out]: src_host > aggregate_filter[total_in]: dst net 2a00:xxxx:xxxx::/48 or dst net > 31.xx.xx.0/21 or dst net 185.xx.xx.0/22 or dst net 62.xx.xx.0/24 or dst > net 194.xx.xx.0/24 > aggregate_filter[total_out]: src net 2a00:xxx:xxx::/48 or src net > 31.xx.xx.0/21 or src net 185.xx.xx.0/22 or src net 62.xx.xx.0/24 or src > net 194.xx.xx.0/24 > sql_table[total_in]: traffic > sql_table[total_out]: traffic > sql_refresh_time[total_in]: 60 > sql_refresh_time[total_out]: 60 > sql_history[total_in]: 5m > sql_history[total_out]: 5m > sql_history_roundoff[total_in]: mh > sql_history_roundoff[total_out]: mh > sql_table_version[total_in]: 4 > sql_table_version[total_out]: 4 > sql_preprocess[total_in]: adjb=+26 > sql_preprocess[total_out]: adjb=+26 > > > # daily time-bins > aggregate[daily_in]: dst_host > aggregate[daily_out]: src_host > aggregate_filter[daily_in]: dst net 2a00:xxxx:xxxx::/48 or dst net > 31.xx.xx.0/21 or dst net 185.xx.xx.0/22 or dst net 62.xx.xx.0/24 or dst > net 194.xx.xx.0/24 > aggregate_filter[daily_out]: src net 2a00:xxx:xxx::/48 or src net > 31.xx.xx.0/21 or src net 185.xx.xx.0/22 or src net 62.xx.xx.0/24 or src > net 194.xx.xx.0/24 > sql_table[daily_in]: traffic_daily > sql_table[daily_out]: traffic_daily > sql_refresh_time[daily_in]: 60 > sql_refresh_time[daily_out]: 60 > sql_history[daily_in]: 1d > sql_history[daily_out]: 1d > sql_history_roundoff[daily_in]: mh > sql_history_roundoff[daily_out]: mh > sql_table_version[daily_in]: 4 > sql_table_version[daily_out]: 4 > sql_preprocess[daily_in]: adjb=+26 > sql_preprocess[daily_out]: adjb=+26 > > > plugins: mysql[total_in], mysql[total_out], mysql[daily_in], > mysql[daily_out] > ------ > > And here's my Mikrotik Traffic Flow (netflow) configuration: > > ------ > /ip traffic-flow > set active-flow-timeout=1m cache-entries=1k enabled=yes interfaces=sfp1 > /ip traffic-flow target > add dst-address=X.X.X.X v9-template-refresh=60 v9-template-timeout=1m > ------ > > > Can anyone think of a reason I get such inconsistent results? Is there > something I miss? > Let me know if you need any further information. > > Thanks. > > > _______________________________________________ > pmacct-discussion mailing list > http://www.pmacct.net/#mailinglists _______________________________________________ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists
