Hi Brian,
This time-wrap is a kind of hassle since ever. It actually has been tested on
many exporters - ok mainly on Cisco's, as it seemed to be used widely.
I haven't looked into softflowd overruns, as I expected them too behave the
same.
I am open to help debugging the stuff, you have me a pcap with an overflow,
which could be rather difficult to produce.
Alternatively, you could add some LogInfo() messages in case of a wrap around
to log the original values to see how compensation needs to be done correctly
or even better to fix softflowd.
Cheers
- Peter
On 02.06.16 16:51, Brian Candler wrote:
> In one site, I am generating flow records from a pfSense (2.3) box
> running softflowd (1.2.1).
>
> On the collector, nfdump (1.6.13) is compiled with --enable-nsel because
> there is also an ASA generating flows.
>
> Looking at the pfSense flows, I am seeing some strange summary records
> where the duration looks to be a 2^32 wraparound (and hence the bps
> values are completely wrong).
>
> $ nfdump -M /var/nfsen/profiles-data/live/lch-fw1 -T -r
> 2016/06/02/nfcapd.201606020300 -c 10 -s proto/flows 'src host 10.26.1.30
> and dst host xxx.xx.xx.1'
> Top 10 Protocol ordered by flows:
> Date first seen Duration Proto Protocol Flows(%)
> Packets(%) Bytes(%) pps bps bpp
> 2016-04-13 09:57:55.236 4294010.415 ICMP 1 1(100.0)
> 24(100.0) 2016(100.0) 0 0 84
>
> Summary: total flows: 1, total bytes: 2016, total packets: 24, avg bps:
> 0, avg pps: 0, avg bpp: 84
> Time window: 2016-04-13 09:38:02 - 2016-06-02 03:02:51
> Total flows processed: 5373, Blocks skipped: 0, Bytes read: 349768
> Sys: 0.004s flows/second: 1343250.0 Wall: 0.003s flows/second: 1704632.0
>
> Note that (4294010.415 * 1000) = 0xfff1662f which is suspiciously near
> to 2^32.
>
> The pfsense box generating these records has been up for 38 days, which
> means it booted on 23rd April. Hence the "Date first seen" of 13th April
> for the flow record doesn't make much sense.
>
> I purposedly chose a summary over a single flow for the example above.
> Looking at the flow itself:
>
> $ nfdump -M /var/nfsen/profiles-data/live/lch-fw1 -T -r
> 2016/06/02/nfcapd.201606020300 -c 10 -o raw 'src host 10.26.1.30 and dst
> host xxx.xx.xx.1'
>
> Flow Record:
> Flags = 0x06 FLOW, Unsampled
> export sysid = 1
> size = 56
> first = 1460537875 [2016-04-13 09:57:55]
> last = 1464831885 [2016-06-02 02:44:45]
> msec_first = 236
> msec_last = 651
> src addr = 10.26.1.30
> dst addr = xxx.xx.xx.1
> ICMP = 8.0 type.code
> fwd status = 0
> tcp flags = 0x00 ......
> proto = 1 ICMP
> (src)tos = 0
> (in)packets = 24
> (in)bytes = 2016
>
> Summary: total flows: 1, total bytes: 2016, total packets: 24, avg bps:
> 0, avg pps: 0, avg bpp: 84
> Time window: 2016-04-13 09:38:02 - 2016-06-02 03:02:51
> Total flows processed: 5373, Blocks skipped: 0, Bytes read: 349768
> Sys: 0.004s flows/second: 1343250.0 Wall: 0.004s flows/second: 1295323.0
>
> The "first" timestamp appears to be April 13th.
>
> If I simply add (2^32 / 1000) to the start time, I get:
>
> new first = 1464832842
>
> last = 1464831885
>
> ...but this gives a "first" time which is now 957 seconds *after* the
> last time :-(
>
> Looking at the source code I found this (calculations are in
> milliseconds at this point):
>
> uint32_t First = data_record->first;
> uint32_t Last = data_record->last;
>
> if ( First > Last )
> /* First in msec, in case of msec
> overflow, between start and end */
> start_time = exporter->boot_time -
> 0x100000000LL + (uint64_t)First;
> else
> start_time = (uint64_t)First +
> exporter->boot_time;
>
> /* end time in msecs */
> end_time = (uint64_t)Last + exporter->boot_time;
>
> if ( (end_time - start_time) > 0xffc00000 &&
> table->bytes < 2000 ) {
> dbg_printf("CISCO bugfix!\n");
> start_time += 0xffc00000;
> }
>
> There's certainly a subtraction of 2^32 possible there; this implies
> that softflowd is giving a flow with First > Last.
>
> I don't think the 0xffc00000 branch applies here: firstly it would bring
> the start_time forward not back, and secondly the (in) bytes count is
> greater than 2000.
>
> However, adding (0xffc00000 / 1000) to the start time, I get a value
> which is about 54 minutes before the finish time, which is vaguely
> plausible.
>
> Any suggestions for what is happening here, or how I might go about
> debugging it? Or has anyone else seen issues like this with softflowd
> and flow start/end timestamps?
>
> Thanks,
>
> Brian.
>
> ------------------------------------------------------------------------------
> What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
> patterns at an interface-level. Reveals which users, apps, and protocols are
> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
> J-Flow, sFlow and other flows. Make informed decisions using capacity
> planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
> _______________________________________________
> Nfdump-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfdump-discuss
>
--
Be nice to your netflow data. Use NfSen and nfdump :)
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
_______________________________________________
Nfdump-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nfdump-discuss