I never got any reply to this thread, so I've created a patch based on the
patch in https://sourceforge.net/p/nfdump/bugs/22/. With this new patch, -t
timewin behaves as it always has, unless -S is given also. Then -t timewin
will only consider the start of a flow to determine whether it matches or
not.
Apparently we are several that would like -t to behave differently, and
this version of the patch does not introduce any backwards-compatibility
issues. Win-win for all, as I see it.
It is a git patch (that also applies cleanly with GNU patch) against
1.6.11. Attached and available here:
https://github.com/pmorch/nfdump/commit/d9ae3d94036639e9b08caa753bd1c867d39d27c8.
I hope it can be included. Or at least commented on.
If I don't hear anything by next week, I'll add it to sourceforge as a
patch and comment to bugs 25 and 22 where I found the original, older patch.
Sincerely,
Peter Mørch
On Fri, Nov 29, 2013 at 1:39 PM, Peter Valdemar Mørch <[email protected]>wrote:
> It looks like I have been bitten by the same behavior as described in
> http://sourceforge.net/p/nfdump/bugs/25/ and
> https://sourceforge.net/p/nfdump/bugs/22/. So I guess Peter understands
> the problem but has decided not to do anything.
>
> Ok.
>
> In the mean time, the patch in https://sourceforge.net/p/nfdump/bugs/22/can
> be used to achieve what I'm trying to achieve. (It fails because there
> are whitespaces differences since bugs/22/ until 1.6.11, but the logic is
> still the same.)
>
> Peter, if someone were to introduce two new options, e.g. -ts and -te that
> behave like -t but operate only on start and end, would you be open to
> accepting such a patch?
>
> Sincerely,
>
> Another :-) Peter
>
>
> On Fri, Nov 29, 2013 at 12:12 AM, Peter Valdemar Mørch <[email protected]>wrote:
>
>> Hi,
>>
>> I want to create statistics for every 5-minute interval based on a filter
>> for use with something similar to RRDTool and I'd like to use nfdump to get
>> it.
>>
>> The simple solution is to just use the appropriately named 5-minute file
>> nfcapd.201311281820, and run a filter on that. But I'm curious as to
>> whether I'm missing a way to do this with the -t option or some other
>> option.
>>
>> Naively, I tried -t 2013/11/28.18:10-2013/11/28.18:15 to get statistics
>> about all the data in the 18:10 - 18:15 interval. But that returned zero
>> flows. It looks like nfdump only includes flows where the entire period is
>> within the to-from times in -t. See "details" below. (Would be nice if man
>> nfdump went into a little more detail about -t)
>>
>> But, if some flows are short lived (e.g. webserver) and some are
>> long-lived (e.g. ssh-connection) I don't see how I can use the -t option to
>> get an idea of how much traffic occurred between 18:10 and 18:15.
>>
>> I guess I was hoping for a -t option that looked exclusively at e.g. flow
>> end, or some way to use flow-end in a filter. Then I could get an idea
>> about traffic in a 5minute period. Where long-running flows would be
>> calculated as-if they occurred entirely at the flow-end time, but this is
>> exactly what you get by looking at one nfcapd.* file at a time, isn't it?
>>
>> The Rolls Royce would be that if a flow ran from 18:09-18:13, 75% of the
>> traffic from that flow would be added to the 18:10-18:15, because 75% of
>> the time is within that period, but hey, I can see that's a little wild.
>>
>> I also tried to experiment with +/-10, but could not get this to work at
>> all.
>>
>> What we *can* do is use only a single time for -t as in "-t
>> 2013/11/28.18:05" and then look at every flow and discard any that end
>> outside the 18:10-18:15 interval. But that is very time consuming
>> especially because one needs to comparisons on time strings such as e.g.
>> "2013-11-28 18:27:46.362". (It would be nice if there was something similar
>> to -N that printed times as unixtime, so the heavy conversion (also in
>> nfdump?) isn't necessary.)
>>
>> Can -t be used to get data from 18:10-18:15? Am I missing something
>> (else) obvious? ( Appart from "just use nfsen" - I have other reasons not
>> to, and I'm trying to understand -t in nfdump )
>>
>> Sincerely,
>>
>> Peter
>>
>> ==========
>> Details
>> ==========
>>
>> I have a test nfcapd file[1], with NetFlow records from nothing but a
>> single long-running ssh connection.
>>
>> Looking at all the data in the file, I get this:
>>
>> > nfdump -r singleSSH.nfcapd -o 'fmt: %td %ts %te %sa:%sp %da:%dp'
>> Duration Date flow start Date flow end Src IP
>> Addr Src Pt Dst IP Addr Dst Pt
>> <snip>
>> 300.716 2013-11-28 17:56:31.752 2013-11-28 18:01:32.468 1.2.3.4:
>> 2222 172.22.216.119: 43654
>> 300.710 2013-11-28 18:01:42.485 2013-11-28 18:06:43.195 1.2.3.4:
>> 2222 172.22.216.119: 43654
>> 300.796 2013-11-28 18:06:53.207 2013-11-28 18:11:54.003 1.2.3.4:
>> 2222 172.22.216.119: 43654
>> 310.813 2013-11-28 18:12:04.019 2013-11-28 18:17:14.832 1.2.3.4:
>> 2222 172.22.216.119: 43654
>> <snip>
>> Summary: total flows: 21, total bytes: 12.1 M, total packets: 33103, avg
>> bps: 14730, avg pps: 5, avg bpp: 364
>> Time window: 2013-11-28 17:20:09 - 2013-11-28 19:09:22
>> Total flows processed: 21, Blocks skipped: 0, Bytes read: 1168
>> Sys: 0.008s flows/second: 2625.0 Wall: 0.000s flows/second: 30882.4
>>
>> And experimenting with the -t option, I see that
>> > nfdump -r singleSSH.nfcapd -o 'fmt: %td %ts %te %sa:%sp %da:%dp' -t
>> 2013/11/28.18:12:04-2013/11/28.18:17:14
>> is the tightest I can go with -t and still get any data around that time
>> period.
>>
>> Assuming I know that the NetFlow collector transmits every 5 minutes, I
>> guess I could do -t 18:10-18:20 and then know that I'll likely only get one
>> flow record from the long-running ssh connection, and then in the next
>> period use 18:15-18:25. That would work most of the time (tm) for the ssh
>> connection. But any short lived flows e.g. entirely inside 18:18 will be
>> counted in both intervals. :-(
>>
>> About +/-10: How am I supposed to use this?
>>
>> > nfdump -r singleSSH.nfcapd -o 'fmt: %td %ts %te %sa:%sp %da:%dp' -t +10
>> Time Window error: No time slot information available
>> > nfdump -r singleSSH.nfcapd -o 'fmt: %td %ts %te %sa:%sp %da:%dp' -t -10
>> Time Window error: No time slot information available
>> > nfdump -r singleSSH.nfcapd -o 'fmt: %td %ts %te %sa:%sp %da:%dp' -t
>> 2013/11/28.18:12:04+10
>> Time format error at '04+10': unexpected character: '+'.
>> > nfdump -r singleSSH.nfcapd -o 'fmt: %td %ts %te %sa:%sp %da:%dp' -t
>> 2013/11/28.18:12:04-10
>> Time format error: '10' unexpected.
>>
>> 1: attached and at http://ge.tt/5Rthyl41/v/0?c
>>
>> --
>> Peter Valdemar Mørch
>> http://www.morch.com
>>
>
>
>
> --
> Peter Valdemar Mørch
> http://www.morch.com
>
--
Peter Valdemar Mørch
http://www.morch.com
From d9ae3d94036639e9b08caa753bd1c867d39d27c8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Peter=20Valdemar=20M=C3=B8rch?= <[email protected]>
Date: Sun, 1 Dec 2013 00:30:36 +0100
Subject: [PATCH] Introduce -S option to make -t look only at start of flow
The -t <timewin> option only includes flows if both the flow's start
and end are inside timewin.
All of these:
http://sourceforge.net/p/nfdump/bugs/22/
http://sourceforge.net/p/nfdump/bugs/25/
and the email in nfdump-discuss with subject:
"How to use the nfdump -t <timewin> option for RRDTool integration?"
describe the same thing. They would like the -t option to only
consider one end of the flow. See one of them for a more thorough
explanation.
This patch achieves that. It introduces an -S option, that only considers
whether the flow's start is inside timewin.
Instead of this pseudo-code:
consider(master_record->first < twin_start)
consider(master_record->last > twin_end)
does this pseudo-code:
consider(master_record->first < twin_start)
if (-S option)
consider(master_record->first >=twin_end)
else
consider(master_record->last >= twin_end)
Beware, that in the getopt() line, all the options are listed in one
long line. That is heavily prone to merge conflicts. Also, I've chosen
the -S option, since it was available and a handy mnemonic for
"start".
The -S option was introduced for backwards-compatibility, so the -t
alone continues to work as it has always done.
---
bin/nfdump.c | 31 ++++++++++++++++++++++++-------
man/nfdump.1 | 5 +++++
2 files changed, 29 insertions(+), 7 deletions(-)
diff --git a/bin/nfdump.c b/bin/nfdump.c
index b5ba188..93b145e 100644
--- a/bin/nfdump.c
+++ b/bin/nfdump.c
@@ -252,7 +252,7 @@ static void PrintSummary(stat_record_t *stat_record, int plain_numbers, int csv_
static stat_record_t process_data(char *wfile, int element_stat, int flow_stat, int sort_flows,
printer_t print_header, printer_t print_record, time_t twin_start, time_t twin_end,
- uint64_t limitflows, int tag, int compress, int do_xstat);
+ uint64_t limitflows, int tag, int compress, int do_xstat, int match_start);
/* Functions */
@@ -308,7 +308,8 @@ static void usage(char *name) {
"-X\t\tDump Filtertable and exit (debug option).\n"
"-Z\t\tCheck filter syntax and exit.\n"
"-t <time>\ttime window for filtering packets\n"
- "\t\tyyyy/MM/dd.hh:mm:ss[-yyyy/MM/dd.hh:mm:ss]\n", name);
+ "\t\tyyyy/MM/dd.hh:mm:ss[-yyyy/MM/dd.hh:mm:ss]\n"
+ "-S\t\ttime window (-t) considers only start\n", name);
} /* usage */
@@ -357,7 +358,7 @@ char bps_str[NUMBER_STRING_SIZE], pps_str[NUMBER_STRING_SIZE], bpp_str[NUMBER_
stat_record_t process_data(char *wfile, int element_stat, int flow_stat, int sort_flows,
printer_t print_header, printer_t print_record, time_t twin_start, time_t twin_end,
- uint64_t limitflows, int tag, int compress, int do_xstat) {
+ uint64_t limitflows, int tag, int compress, int do_xstat, int match_start) {
common_record_t *flow_record;
master_record_t *master_record;
nffile_t *nffile_w, *nffile_r;
@@ -570,7 +571,19 @@ int v1_map_done = 0;
// Time based filter
// if no time filter is given, the result is always true
- match = twin_start && (master_record->first < twin_start || master_record->last > twin_end) ? 0 : 1;
+ if (twin_start) {
+ if (master_record->first < twin_start)
+ match = 0;
+ else {
+ if (match_start) {
+ match = master_record->first >= twin_end ? 0:1;
+ } else {
+ match = master_record->last >= twin_end ? 0:1;
+ }
+ }
+ } else
+ match = 1;
+
match &= limitflows ? stat_record.numflows < limitflows : 1;
// filter netflow record with user supplied filter
@@ -716,7 +729,7 @@ char *rfile, *Rfile, *Mdirs, *wfile, *ffile, *filter, *tstring, *stat_type;
char *byte_limit_string, *packet_limit_string, *print_format, *record_header;
char *print_order, *query_file, *UnCompress_file, *nameserver, *aggr_fmt;
int c, ffd, ret, element_stat, fdump;
-int i, user_format, quiet, flow_stat, topN, aggregate, aggregate_mask, bidir;
+int i, user_format, quiet, flow_stat, topN, aggregate, aggregate_mask, bidir, match_start;
int print_stat, syntax_only, date_sorted, do_tag, compress, do_xstat;
int plain_numbers, GuessDir, pipe_output, csv_output;
time_t t_start, t_end;
@@ -730,6 +743,7 @@ char Ident[IDENTLEN];
fdump = aggregate = 0;
aggregate_mask = 0;
bidir = 0;
+ match_start = 0;
t_start = t_end = 0;
syntax_only = 0;
topN = 10;
@@ -767,7 +781,7 @@ char Ident[IDENTLEN];
for ( i=0; i<AGGR_SIZE; AggregateMasks[i++] = 0 ) ;
- while ((c = getopt(argc, argv, "6aA:Bbc:D:E:s:hHn:i:j:f:qzr:v:w:K:M:NImO:R:XZt:TVv:x:l:L:o:")) != EOF) {
+ while ((c = getopt(argc, argv, "6aA:Bbc:D:E:s:hHn:i:j:f:qzr:v:w:K:M:NImO:R:XZt:STVv:x:l:L:o:")) != EOF) {
switch (c) {
case 'h':
usage(argv[0]);
@@ -831,6 +845,9 @@ char Ident[IDENTLEN];
exit(255);
}
break;
+ case 'S':
+ match_start = 1;
+ break;
case 'V': {
char *e1, *e2;
e1 = "";
@@ -1172,7 +1189,7 @@ char Ident[IDENTLEN];
nfprof_start(&profile_data);
sum_stat = process_data(wfile, element_stat, aggregate || flow_stat, print_order != NULL,
print_header, print_record, t_start, t_end,
- limitflows, do_tag, compress, do_xstat);
+ limitflows, do_tag, compress, do_xstat, match_start);
nfprof_end(&profile_data, total_flows);
if ( total_bytes == 0 ) {
diff --git a/man/nfdump.1 b/man/nfdump.1
index 56e1bc9..bfaa67f 100755
--- a/man/nfdump.1
+++ b/man/nfdump.1
@@ -74,6 +74,11 @@ onwards. The time window may also be specified as +/\- n. In this case
it is relativ to the beginning or end of all flows. +10 means the first
10 seconds of all flows, \-10 means the last 10 seconds of all flows.
.TP 3
+.B -S
+Controls how the timewin is interpreted. If specified, the flow is included if
+the start of the flow is inside the timewin. Otherwise the entire flow needs to
+be inside timewin.
+.TP 3
.B -c \fInum
Limit number of records to process to the first \fInum\fR flows.
.TP 3
--
1.7.2.5
------------------------------------------------------------------------------
Sponsored by Intel(R) XDK
Develop, test and display web and hybrid apps with a single code base.
Download it for free now!
http://pubads.g.doubleclick.net/gampad/clk?id=111408631&iu=/4140/ostg.clktrk
_______________________________________________
Nfdump-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nfdump-discuss