Re: [Nfdump-discuss] How to use the nfdump -t option for RRDTool integration?

Peter Valdemar Mørch Wed, 04 Dec 2013 14:33:00 -0800

I never got any reply to this thread, so I've created a patch based on the
patch in https://sourceforge.net/p/nfdump/bugs/22/. With this new patch, -t
timewin behaves as it always has, unless -S is given also. Then -t timewin
will only consider the start of a flow to determine whether it matches or
not.


Apparently we are several that would like -t to behave differently, and
this version of the patch does not introduce any backwards-compatibility
issues. Win-win for all, as I see it.

It is a git patch (that also applies cleanly with GNU patch) against
1.6.11. Attached and available here:
https://github.com/pmorch/nfdump/commit/d9ae3d94036639e9b08caa753bd1c867d39d27c8.

I hope it can be included. Or at least commented on.

If I don't hear anything by next week, I'll add it to sourceforge as a
patch and comment to bugs 25 and 22 where I found the original, older patch.

Sincerely,

Peter Mørch


On Fri, Nov 29, 2013 at 1:39 PM, Peter Valdemar Mørch <[email protected]>wrote:

> It looks like I have been bitten by the same behavior as described in
> http://sourceforge.net/p/nfdump/bugs/25/ and
> https://sourceforge.net/p/nfdump/bugs/22/. So I guess Peter understands
> the problem but has decided not to do anything.
>
> Ok.
>
> In the mean time, the patch in https://sourceforge.net/p/nfdump/bugs/22/can 
> be used to achieve what I'm trying to achieve. (It fails because there
> are whitespaces differences since bugs/22/ until 1.6.11, but the logic is
> still the same.)
>
> Peter, if someone were to introduce two new options, e.g. -ts and -te that
> behave like -t but operate only on start and end, would you be open to
> accepting such a patch?
>
> Sincerely,
>
> Another :-) Peter
>
>
> On Fri, Nov 29, 2013 at 12:12 AM, Peter Valdemar Mørch <[email protected]>wrote:
>
>> Hi,
>>
>> I want to create statistics for every 5-minute interval based on a filter
>> for use with something similar to RRDTool and I'd like to use nfdump to get
>> it.
>>
>> The simple solution is to just use the appropriately named 5-minute file
>> nfcapd.201311281820, and run a filter on that. But I'm curious as to
>> whether I'm missing a way to do this with the -t option or some other
>> option.
>>
>> Naively, I tried -t 2013/11/28.18:10-2013/11/28.18:15 to get statistics
>> about all the data in the 18:10 - 18:15 interval. But that returned zero
>> flows. It looks like nfdump only includes flows where the entire period is
>> within the to-from times in -t. See "details" below. (Would be nice if man
>> nfdump went into a little more detail about -t)
>>
>> But, if some flows are short lived (e.g. webserver) and some are
>> long-lived (e.g. ssh-connection) I don't see how I can use the -t option to
>> get an idea of how much traffic occurred between 18:10 and 18:15.
>>
>> I guess I was hoping for a -t option that looked exclusively at e.g. flow
>> end, or some way to use flow-end in a filter. Then I could  get an idea
>> about traffic in a 5minute period. Where long-running flows would be
>> calculated as-if they occurred entirely at the flow-end time, but this is
>> exactly what you get by looking at one nfcapd.* file at a time, isn't it?
>>
>> The Rolls Royce would be that if a flow ran from 18:09-18:13, 75% of the
>> traffic from that flow would be added to the 18:10-18:15, because 75% of
>> the time is within that period, but hey, I can see that's a little wild.
>>
>> I also tried to experiment with +/-10, but could not get this to work at
>> all.
>>
>> What we *can* do is use only a single time for -t as in "-t
>> 2013/11/28.18:05" and then look at every flow and discard any that end
>> outside the 18:10-18:15 interval. But that is very time consuming
>> especially because one needs to comparisons on time strings such as e.g.
>> "2013-11-28 18:27:46.362". (It would be nice if there was something similar
>> to -N that printed times as unixtime, so the heavy conversion (also in
>> nfdump?) isn't necessary.)
>>
>> Can -t be used to get data from 18:10-18:15? Am I missing something
>> (else) obvious? ( Appart from "just use nfsen" - I have other reasons not
>> to, and I'm trying to understand -t in nfdump )
>>
>> Sincerely,
>>
>> Peter
>>
>> ==========
>>    Details
>> ==========
>>
>> I have a test nfcapd file[1], with NetFlow records from nothing but a
>> single long-running ssh connection.
>>
>> Looking at all the data in the file, I get this:
>>
>> > nfdump -r singleSSH.nfcapd -o 'fmt: %td %ts %te %sa:%sp %da:%dp'
>>   Duration Date flow start         Date flow end                Src IP
>> Addr Src Pt      Dst IP Addr Dst Pt
>>    <snip>
>>    300.716 2013-11-28 17:56:31.752 2013-11-28 18:01:32.468    1.2.3.4:
>> 2222   172.22.216.119: 43654
>>    300.710 2013-11-28 18:01:42.485 2013-11-28 18:06:43.195    1.2.3.4:
>> 2222   172.22.216.119: 43654
>>    300.796 2013-11-28 18:06:53.207 2013-11-28 18:11:54.003    1.2.3.4:
>> 2222   172.22.216.119: 43654
>>    310.813 2013-11-28 18:12:04.019 2013-11-28 18:17:14.832    1.2.3.4:
>> 2222   172.22.216.119: 43654
>>    <snip>
>> Summary: total flows: 21, total bytes: 12.1 M, total packets: 33103, avg
>> bps: 14730, avg pps: 5, avg bpp: 364
>> Time window: 2013-11-28 17:20:09 - 2013-11-28 19:09:22
>> Total flows processed: 21, Blocks skipped: 0, Bytes read: 1168
>> Sys: 0.008s flows/second: 2625.0     Wall: 0.000s flows/second: 30882.4
>>
>> And experimenting with the -t option, I see that
>> > nfdump -r singleSSH.nfcapd -o 'fmt: %td %ts %te %sa:%sp %da:%dp' -t
>> 2013/11/28.18:12:04-2013/11/28.18:17:14
>> is the tightest I can go with -t and still get any data around that time
>> period.
>>
>> Assuming I know that the NetFlow collector transmits every 5 minutes, I
>> guess I could do -t 18:10-18:20 and then know that I'll likely only get one
>> flow record from the long-running ssh connection, and then in the next
>> period use 18:15-18:25. That would work most of the time (tm) for the ssh
>> connection. But any short lived flows e.g. entirely inside 18:18 will be
>> counted in both intervals. :-(
>>
>> About +/-10: How am I supposed to use this?
>>
>> > nfdump -r singleSSH.nfcapd -o 'fmt: %td %ts %te %sa:%sp %da:%dp' -t +10
>> Time Window error: No time slot information available
>> > nfdump -r singleSSH.nfcapd -o 'fmt: %td %ts %te %sa:%sp %da:%dp' -t -10
>> Time Window error: No time slot information available
>> > nfdump -r singleSSH.nfcapd -o 'fmt: %td %ts %te %sa:%sp %da:%dp' -t
>> 2013/11/28.18:12:04+10
>> Time format error at '04+10': unexpected character: '+'.
>> > nfdump -r singleSSH.nfcapd -o 'fmt: %td %ts %te %sa:%sp %da:%dp' -t
>> 2013/11/28.18:12:04-10
>> Time format error: '10' unexpected.
>>
>> 1: attached and at http://ge.tt/5Rthyl41/v/0?c
>>
>> --
>> Peter Valdemar Mørch
>> http://www.morch.com
>>
>
>
>
> --
> Peter Valdemar Mørch
> http://www.morch.com
>



-- 
Peter Valdemar Mørch
http://www.morch.com

From d9ae3d94036639e9b08caa753bd1c867d39d27c8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Peter=20Valdemar=20M=C3=B8rch?= <[email protected]>
Date: Sun, 1 Dec 2013 00:30:36 +0100
Subject: [PATCH] Introduce -S option to make -t look only at start of flow

The -t <timewin> option only includes flows if both the flow's start
and end are inside timewin.

All of these:
http://sourceforge.net/p/nfdump/bugs/22/
http://sourceforge.net/p/nfdump/bugs/25/
and the email in nfdump-discuss with subject:
"How to use the nfdump -t <timewin> option for RRDTool integration?"

describe the same thing.  They would like the -t option to only
consider one end of the flow. See one of them for a more thorough
explanation.

This patch achieves that. It introduces an -S option, that only considers
whether the flow's start is inside timewin.

Instead of this pseudo-code:

    consider(master_record->first < twin_start)
    consider(master_record->last > twin_end)

does this pseudo-code:

    consider(master_record->first < twin_start)
    if (-S option)
        consider(master_record->first >=twin_end)
    else
        consider(master_record->last >= twin_end)

Beware, that in the getopt() line, all the options are listed in one
long line. That is heavily prone to merge conflicts. Also, I've chosen
the -S option, since it was available and a handy mnemonic for
"start".

The -S option was introduced for backwards-compatibility, so the -t
alone continues to work as it has always done.
---
 bin/nfdump.c |   31 ++++++++++++++++++++++++-------
 man/nfdump.1 |    5 +++++
 2 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/bin/nfdump.c b/bin/nfdump.c
index b5ba188..93b145e 100644
--- a/bin/nfdump.c
+++ b/bin/nfdump.c
@@ -252,7 +252,7 @@ static void PrintSummary(stat_record_t *stat_record, int plain_numbers, int csv_
 
 static stat_record_t process_data(char *wfile, int element_stat, int flow_stat, int sort_flows,
 	printer_t print_header, printer_t print_record, time_t twin_start, time_t twin_end, 
-	uint64_t limitflows, int tag, int compress, int do_xstat);
+	uint64_t limitflows, int tag, int compress, int do_xstat, int match_start);
 
 /* Functions */
 
@@ -308,7 +308,8 @@ static void usage(char *name) {
 					"-X\t\tDump Filtertable and exit (debug option).\n"
 					"-Z\t\tCheck filter syntax and exit.\n"
 					"-t <time>\ttime window for filtering packets\n"
-					"\t\tyyyy/MM/dd.hh:mm:ss[-yyyy/MM/dd.hh:mm:ss]\n", name);
+					"\t\tyyyy/MM/dd.hh:mm:ss[-yyyy/MM/dd.hh:mm:ss]\n"
+					"-S\t\ttime window (-t) considers only start\n", name);
 } /* usage */
 
 
@@ -357,7 +358,7 @@ char 		bps_str[NUMBER_STRING_SIZE], pps_str[NUMBER_STRING_SIZE], bpp_str[NUMBER_
 
 stat_record_t process_data(char *wfile, int element_stat, int flow_stat, int sort_flows,
 	printer_t print_header, printer_t print_record, time_t twin_start, time_t twin_end, 
-	uint64_t limitflows, int tag, int compress, int do_xstat) {
+	uint64_t limitflows, int tag, int compress, int do_xstat, int match_start) {
 common_record_t 	*flow_record;
 master_record_t		*master_record;
 nffile_t			*nffile_w, *nffile_r;
@@ -570,7 +571,19 @@ int	v1_map_done = 0;
 
 					// Time based filter
 					// if no time filter is given, the result is always true
-					match  = twin_start && (master_record->first < twin_start || master_record->last > twin_end) ? 0 : 1;
+					if (twin_start) {
+						if (master_record->first < twin_start)
+							match = 0;
+						else {
+							if (match_start) {
+								match =  master_record->first >= twin_end ? 0:1;
+							} else {
+								match =  master_record->last >= twin_end ? 0:1;
+							}
+						}
+					} else
+						match = 1;
+
 					match &= limitflows ? stat_record.numflows < limitflows : 1;
 
 					// filter netflow record with user supplied filter
@@ -716,7 +729,7 @@ char 		*rfile, *Rfile, *Mdirs, *wfile, *ffile, *filter, *tstring, *stat_type;
 char		*byte_limit_string, *packet_limit_string, *print_format, *record_header;
 char		*print_order, *query_file, *UnCompress_file, *nameserver, *aggr_fmt;
 int 		c, ffd, ret, element_stat, fdump;
-int 		i, user_format, quiet, flow_stat, topN, aggregate, aggregate_mask, bidir;
+int 		i, user_format, quiet, flow_stat, topN, aggregate, aggregate_mask, bidir, match_start;
 int 		print_stat, syntax_only, date_sorted, do_tag, compress, do_xstat;
 int			plain_numbers, GuessDir, pipe_output, csv_output;
 time_t 		t_start, t_end;
@@ -730,6 +743,7 @@ char 		Ident[IDENTLEN];
 	fdump = aggregate = 0;
 	aggregate_mask	= 0;
 	bidir			= 0;
+	match_start		= 0;
 	t_start = t_end = 0;
 	syntax_only	    = 0;
 	topN	        = 10;
@@ -767,7 +781,7 @@ char 		Ident[IDENTLEN];
 
 	for ( i=0; i<AGGR_SIZE; AggregateMasks[i++] = 0 ) ;
 
-	while ((c = getopt(argc, argv, "6aA:Bbc:D:E:s:hHn:i:j:f:qzr:v:w:K:M:NImO:R:XZt:TVv:x:l:L:o:")) != EOF) {
+	while ((c = getopt(argc, argv, "6aA:Bbc:D:E:s:hHn:i:j:f:qzr:v:w:K:M:NImO:R:XZt:STVv:x:l:L:o:")) != EOF) {
 		switch (c) {
 			case 'h':
 				usage(argv[0]);
@@ -831,6 +845,9 @@ char 		Ident[IDENTLEN];
                     exit(255);
                 } 
 				break;
+			case 'S':
+				match_start = 1;
+				break;
 			case 'V': {
 				char *e1, *e2;
 				e1 = "";
@@ -1172,7 +1189,7 @@ char 		Ident[IDENTLEN];
 	nfprof_start(&profile_data);
 	sum_stat = process_data(wfile, element_stat, aggregate || flow_stat, print_order != NULL,
 						print_header, print_record, t_start, t_end, 
-						limitflows, do_tag, compress, do_xstat);
+						limitflows, do_tag, compress, do_xstat, match_start);
 	nfprof_end(&profile_data, total_flows);
 
 	if ( total_bytes == 0 ) {
diff --git a/man/nfdump.1 b/man/nfdump.1
index 56e1bc9..bfaa67f 100755
--- a/man/nfdump.1
+++ b/man/nfdump.1
@@ -74,6 +74,11 @@ onwards. The time window may also be specified as +/\- n. In this case
 it is relativ to the beginning or end of all flows. +10 means the first
 10 seconds of all flows, \-10 means the last 10 seconds of all flows.
 .TP 3
+.B -S
+Controls how the timewin is interpreted. If specified, the flow is included if
+the start of the flow is inside the timewin. Otherwise the entire flow needs to
+be inside timewin.
+.TP 3
 .B -c \fInum
 Limit number of records to process to the first \fInum\fR flows.
 .TP 3
-- 
1.7.2.5

------------------------------------------------------------------------------
Sponsored by Intel(R) XDK 
Develop, test and display web and hybrid apps with a single code base.
Download it for free now!
http://pubads.g.doubleclick.net/gampad/clk?id=111408631&iu=/4140/ostg.clktrk

_______________________________________________
Nfdump-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nfdump-discuss

Re: [Nfdump-discuss] How to use the nfdump -t option for RRDTool integration?

Reply via email to