Hi Vaggelis,

I fully agree SNMP should be taken as ground truth; I have a couple
of suggestions how to start troubleshooting this. 

Posed I'm no expert of RouterOS; if it has a NetFlow export process,
can you check if it pegs at 100% CPU? Or if anything suspicious emerges
from the router logs? 

On the nfacctd side, if logs are clean then it should mean internal
buffering is OK. Still, better to double-check buffering between the
kernel and nfacctd. At this propo, can you please follow notes in
section D of chapter XXI of a recent pmacct QUICKSTART guide ( see
https://github.com/paololucente/pmacct/blob/master/pmacct/QUICKSTART ),
essentially to check if there is any UDP drops?

Finally, i see sql_refresh_time and sql_history are set to different
values - meaning SQL UPDATE queries are involved; this is OK as long
as the actual database does not suffer from them; can you check that
SQL writer processes are not piling up? This can be done with a simple
"ps auxw | grep nfacctd".

Cheers,
Paolo

On Fri, Nov 27, 2015 at 04:48:48PM +0200, Vaggelis Koutroumpas wrote:
> Hello,
> 
> I am using nfacct with Mikrotik RouterOS to account for the traffic our
> clients do each month.
> I do aggregation per IP to get the total bytes for each IP for all our
> prefixes.
> 
> nfacct seems to be working fine with Mikrotik (it receives the flows
> without any errors when running in debug mode).
> The problem I encounter is that there are significant discrepancies
> between what nfacct counts and what other tools count.
> 
> I compare the nfacct results with Solarwinds (netflow) and Observium (SNMP).
> I understand that SNMP will show different numbers since it counts the
> switch ports octets including the ethernet overhead data etc (I've
> included a 26bytes adjb on my nfacct config though to account for that
> as per pmacct FAQ).
> But even between 2 netflow collectors the data are different.
> 
> Actually even between 2 different databases of nfacct data (using the
> same nfacct instance) the data are not consistent.
> 
> For example for today (27-11-2015) until the time of this writing, all 4
> implementations have different values.
> 
> -------
> Observium/SNMP:
> Total IN: 69.12GB
> Total OUT: 318.22GB
> 
> Solarwinds/Netflow:
> Total IN: 60.4GB
> Total OUT: 315GB
> 
> nfacct (history 1d, refresh 60):
> Total IN: 69.20GB
> Total OUT: 302.74GB
> 
> nfacct (history 5m, refresh 60):
> Total IN: 68.44GB
> Total OUT: 300.04GB
> -------
> 
> The above (nfacct) numbers where calculated using standard SQL queries
> such as:
> SELECT (
>     SELECT concat(truncate((sum(bytes)/1024/1024/1024),2), 'GB') as bytes
>     FROM netflow
>     WHERE ip_dst = '0.0.0.0' AND stamp_inserted = '2015-11-27 00:00:00'
> ) as total_out, (
>     SELECT concat(truncate((sum(bytes)/1024/1024/1024),2), 'GB') as bytes
>     FROM netflow
>     WHERE ip_src = '0.0.0.0' AND stamp_inserted = '2015-11-27 00:00:00'
> ) as total_in
> 
> So which of the above are the "correct" values?
> Since our datacenter charges us based on their SNMP counters on our
> uplink ports, and since we have crosschecked their measurements with
> ours (observium) and are the exact same, I take the SNMP/Observium
> results as my comparison baseline.
> 
> I've been beating myself for the last 2 weeks trying to figure out
> what's causing those skewed numbers.
> On my lab where the traffic is controlled during tests I can do file
> transfers and account every last byte without any discrepancies.
> But when running the same config on the production site, I never get
> consistent data (but there is also way more traffic and more IPs
> generating that traffic)
> 
> 
> Here is my nfacct config:
> 
> ------
> daemonize: true
> pidfile: /var/run/nfacctd.pid
> sql_db: pmacct
> sql_host: localhost
> sql_user: *****
> sql_passwd: *****
> nfacctd_port: 2055
> 
> plugin_pipe_size: 16384000
> plugin_buffer_size: 16384
> 
> # 5min time-bins
> aggregate[total_in]: dst_host
> aggregate[total_out]: src_host
> aggregate_filter[total_in]: dst net 2a00:xxxx:xxxx::/48 or dst net
> 31.xx.xx.0/21 or dst net 185.xx.xx.0/22 or dst net 62.xx.xx.0/24 or dst
> net 194.xx.xx.0/24
> aggregate_filter[total_out]: src net 2a00:xxx:xxx::/48 or src net
> 31.xx.xx.0/21 or src net 185.xx.xx.0/22 or src net 62.xx.xx.0/24 or src
> net 194.xx.xx.0/24
> sql_table[total_in]: traffic
> sql_table[total_out]: traffic
> sql_refresh_time[total_in]: 60
> sql_refresh_time[total_out]: 60
> sql_history[total_in]: 5m
> sql_history[total_out]: 5m
> sql_history_roundoff[total_in]: mh
> sql_history_roundoff[total_out]: mh
> sql_table_version[total_in]: 4
> sql_table_version[total_out]: 4
> sql_preprocess[total_in]: adjb=+26
> sql_preprocess[total_out]: adjb=+26
> 
> 
> # daily time-bins
> aggregate[daily_in]: dst_host
> aggregate[daily_out]: src_host
> aggregate_filter[daily_in]: dst net 2a00:xxxx:xxxx::/48 or dst net
> 31.xx.xx.0/21 or dst net 185.xx.xx.0/22 or dst net 62.xx.xx.0/24 or dst
> net 194.xx.xx.0/24
> aggregate_filter[daily_out]: src net 2a00:xxx:xxx::/48 or src net
> 31.xx.xx.0/21 or src net 185.xx.xx.0/22 or src net 62.xx.xx.0/24 or src
> net 194.xx.xx.0/24
> sql_table[daily_in]: traffic_daily
> sql_table[daily_out]: traffic_daily
> sql_refresh_time[daily_in]: 60
> sql_refresh_time[daily_out]: 60
> sql_history[daily_in]: 1d
> sql_history[daily_out]: 1d
> sql_history_roundoff[daily_in]: mh
> sql_history_roundoff[daily_out]: mh
> sql_table_version[daily_in]: 4
> sql_table_version[daily_out]: 4
> sql_preprocess[daily_in]: adjb=+26
> sql_preprocess[daily_out]: adjb=+26
> 
> 
> plugins: mysql[total_in], mysql[total_out], mysql[daily_in],
> mysql[daily_out]
> ------
> 
> And here's my Mikrotik Traffic Flow (netflow) configuration:
> 
> ------
> /ip traffic-flow
> set active-flow-timeout=1m cache-entries=1k enabled=yes interfaces=sfp1
> /ip traffic-flow target
> add dst-address=X.X.X.X v9-template-refresh=60 v9-template-timeout=1m
> ------
> 
> 
> Can anyone think of a reason I get such inconsistent results? Is there
> something I miss?
> Let me know if you need any further information.
> 
> Thanks.
> 
> 
> _______________________________________________
> pmacct-discussion mailing list
> http://www.pmacct.net/#mailinglists

_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

Reply via email to