Re: [pmacct-discussion] Flexible aggregation

2009-06-13 Thread Karl O. Pinc


On 06/13/2009 03:49:07 PM, Paolo Lucente wrote:

Hi Chris,

On Sat, Jun 13, 2009 at 03:07:01PM -0500, Karl O. Pinc wrote:

>> We are only interested in a single table.
>
> Why can't two separate sql plugins write to the same table?

What Karl is proposing here might really result in a simpler
approach compared to the sub-aggregation scenario - which, with
some care (ie. sql_startup_delay to svoid events syncronization
while retaining same sql_history and sql_refresh_time settings),
can not only achieve same results but best of all is already
there. Let us know your thoughts!


A good database should not have problems with simultaneous updates,
or is there another reason why synchronization is an issue?
If database keys are a problem that should be easily dealt
with by adding another (optional) config parameter to specify
a unique part of a key.  (Although good database design says
that you don't put meaningful information into a key, which
makes the key issue moot but would still require another
database column to track the source (plugid "id") of the
data.)


Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein

___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists


Re: [pmacct-discussion] Flexible aggregation

2009-06-13 Thread Paolo Lucente
Hi Chris,

On Sat, Jun 13, 2009 at 03:07:01PM -0500, Karl O. Pinc wrote:

>> We are only interested in a single table.
>
> Why can't two separate sql plugins write to the same table?

What Karl is proposing here might really result in a simpler
approach compared to the sub-aggregation scenario - which, with
some care (ie. sql_startup_delay to svoid events syncronization
while retaining same sql_history and sql_refresh_time settings),
can not only achieve same results but best of all is already
there. Let us know your thoughts!

Cheers,
Paolo


___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists


Re: [pmacct-discussion] Flexible aggregation

2009-06-13 Thread Paolo Lucente
Hi Karl,

On Sat, Jun 13, 2009 at 03:03:04PM -0500, Karl O. Pinc wrote:

> What really is the constrained resource here?  Is it the number
> of transactions the database supports or is it something more
> fundamental to the pmacct performance like cpu or memory
> constraints?  What I'm thinking is that past a certain
> point having the system do more work to detect that it's
> under load becomes counter-productive.   Is that why
> you've not implemented such a feature for the in-memory
> tables or is it simply because it's up to the pmacct
> command to determine sample frequency?
>
> Just curious.

The constrained resource is the SQL database; not pmacct
itself, the cache size, either CPU or memory usage. This
is because the database performance is sensitive to rate
of insertion, number of tuples, index performance, etc.

A fundmental difference between the two backends is that
the SQL database is conceived to keep historical data (by
featuring timestamps) whereas memory tables typical usage
sees them being saved somewhere persistently (RRD files,
fed into 3rd party scripts/applications, etc.) and being
wiped out at regular intervals.

That is why such a thing was not really implemented for
in-memory tables: after understanding the footprint of
the memory table in, say, a lab environment, it's not a
great deal to keep it healthy: it simply keeps working
smoothly as long as the hashing algorithm is able to
disperse data through the available buckets. 

Cheers,
Paolo


___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists


Re: [pmacct-discussion] Flexible aggregation

2009-06-13 Thread Karl O. Pinc


On 06/13/2009 09:58:17 AM, Chris Wilson wrote:

Hi Paolo,

On Sat, 13 Jun 2009, Paolo Lucente wrote:




> Let me spend a couple of words on a different aspect: the above
approach
> implies everything ends in the same SQL table - which can have pros
and
> cons; the pro is simplicity (one table for everything); the con is
that
> might want to have sub-aggregated data clearly separated into a
> different table to, say, apply different policies. This is something
can
> be done today with pmacct as 'sql_preprocess' offers also the "max"
> version of the "min" features you are using. It means having, for
> example, two SQL plugins, writing to different SQL tables,
aggregating
> data differently and using complementary sql_preprocess features (so

> that at the end by summing data in both tables one ends with the
full
> picture). Would this be a feasible approach to you?

We are only interested in a single table.


Why can't two separate sql plugins write to the same table?


Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein

___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists


Re: [pmacct-discussion] Flexible aggregation

2009-06-13 Thread Karl O. Pinc


On 06/13/2009 05:11:40 AM, Paolo Lucente wrote:

Hi Chris,



Aguri is slightly more limited in the fact it has only a set of
(4?) traffic aggregation profiles whereas pmacct offers a wider
range of primitives. But I guess the point you wanted to make was
the dynamic variation of the sampling rate under increased traffic
load (ie. DDoS).

pmacct actually does have such feature only available to the SQL
plugins: it's part of the SQL preprocess infrastructure (look for
'sql_preprocess' in the CONFIG-KEYS document or the wiki) and is
called 'fsrc' (Flow Sampling under Resource Constraints). It is
an implementation i did years ago loosely based on a paper coming
from AT&T Labs. It aims at offering to the SQL database a sort of
stream-lined number of aggregates; aggregates are weighted, ranked
and sampled based on probability (which gives the dynamic/adaptive
part of the approach); the resource constraint is expressed via
the number of flows you want to end in the database (which is in
turn seen as the constrained resource here).


What really is the constrained resource here?  Is it the number
of transactions the database supports or is it something more
fundamental to the pmacct performance like cpu or memory
constraints?  What I'm thinking is that past a certain
point having the system do more work to detect that it's
under load becomes counter-productive.   Is that why
you've not implemented such a feature for the in-memory
tables or is it simply because it's up to the pmacct
command to determine sample frequency?

Just curious.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein

___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists


Re: [pmacct-discussion] Flexible aggregation

2009-06-13 Thread Chris Wilson
Hi Paolo,

On Sat, 13 Jun 2009, Paolo Lucente wrote:

> > minb = 1, zero_dstip, minb = 1, zero_dstport, minb = 1, 
> > zero_srcport, minb = 1, zero_srcip
> > 
> > Then any flows which together do not add up to enough bytes to pass 
> > the minb filters, even after aggregation, end up in a record where all 
> > the selector fields are zeroed out. Since there is no final minb 
> > condition, this row would always be added to the database, never 
> > rejected, so SUM(bytes) would again equal the interface counters for 
> > any given time range.
> 
> I explored this valid approach some time ago (years!); by zeroing some
> aggregation primitives previously selected, duplicates are likely to be
> created. The trick is to "resolve" such duplicates before offering them
> to the SQL database - via a sub-aggregation operation. The cache is not
> sorted - making any sub-aggregation operation very expensive (scaling
> linearly with the number of aggregated being offered); the idea here is
> to index the cache, perform the sub-aggregation and offer the result of
> this to the SQL database. 

I agree that merging duplicate records would produce the most useful 
results for us.

> In summary, it's not something quick to do but it can be done - maybe 
> something good for inclusion within the 0.12 trunk later in the year. At 
> this stage, this feature can't be included in the first pre-release 
> version (0.12.0p1) but I can plan it along the rocky way to the first 
> official release, 0.12.0. Maybe already in 0.12.0p2. How does it sound?

That sounds great! I was not expecting you to offer to implement it so 
quickly. I understand that it's difficult and may conflict with your other 
priorities.

> Let me spend a couple of words on a different aspect: the above approach 
> implies everything ends in the same SQL table - which can have pros and 
> cons; the pro is simplicity (one table for everything); the con is that 
> might want to have sub-aggregated data clearly separated into a 
> different table to, say, apply different policies. This is something can 
> be done today with pmacct as 'sql_preprocess' offers also the "max" 
> version of the "min" features you are using. It means having, for 
> example, two SQL plugins, writing to different SQL tables, aggregating 
> data differently and using complementary sql_preprocess features (so 
> that at the end by summing data in both tables one ends with the full 
> picture). Would this be a feasible approach to you?

We are only interested in a single table. We can show "0.0.0.0" as 
"Aggregated out" in the pmGraph user interface. I'd rather that we didn't 
have to query five separate tables to get the results at different levels 
of aggregation, and merge them all together in our code. However I can see 
that some people would prefer to keep them in separate tables.

Cheers, Chris.
-- 
Aptivate | http://www.aptivate.org | Phone: +44 1223 760887
The Humanitarian Centre, Fenner's, Gresham Road, Cambridge CB1 2ES

Aptivate is a not-for-profit company registered in England and Wales
with company number 04980791.

___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists


Re: [pmacct-discussion] Flexible aggregation

2009-06-13 Thread Paolo Lucente
Hi Chris,

On Sat, Jun 13, 2009 at 02:08:56PM +0300, Chris Wilson wrote:

> I guess so; I was thinking that Aguri seems to store its output in text 
> files rather than a database, and perhaps provides more dynamic/automatic 
> filtering, but seems to be a research project and not highly supported or 
> maintained.

It seems it does this summary of summaries; such aggregate summaries
seems to be produced upon receipt by the application of an HUP signal.
It sounds like the application saves down everything to its maximum
resolution, say with all primitives enabled, but then as a "frontend"
feature (presenting statistics) is able to aggregate it. 

> We are using this feature to filter out small flows, but the problem is 
> that they are not accounted for at all, so the database contents e.g. 
> SUM(bytes) no longer reflect the interface totals.
> 
> What I would ideally like to see, but I realise that it's hard is 
> something like this:
> 
> Initial filter selects flows over a certain size and non-selected flows 
> can either be discarded (as now) or reaggregated by zeroing a selected 
> feature, e.g. the destination port, and combined into a new single record 
> if there is more than one of them. These, more highly aggregated records 
> then continue down the preprocess chain, and if they fail to match a later 
> condition then they can be aggregated again in a different way, e.g. by 
> zeroing the destination IP address, and so on, until we end up with a 
> single record where all the features were aggregated.
> 
> For example, sql_preprocess might look something like this:
> 
> minb = 1, zero_dstip, minb = 1, zero_dstport, minb = 1, 
> zero_srcport, minb = 1, zero_srcip
> 
> Then any flows which together do not add up to enough bytes to pass the 
> minb filters, even after aggregation, end up in a record where all the 
> selector fields are zeroed out. Since there is no final minb condition, 
> this row would always be added to the database, never rejected, so 
> SUM(bytes) would again equal the interface counters for any given time 
> range.

I explored this valid approach some time ago (years!); by zeroing some
aggregation primitives previously selected, duplicates are likely to be
created. The trick is to "resolve" such duplicates before offering them
to the SQL database - via a sub-aggregation operation. The cache is not
sorted - making any sub-aggregation operation very expensive (scaling
linearly with the number of aggregated being offered); the idea here is
to index the cache, perform the sub-aggregation and offer the result of
this to the SQL database. 

In summary, it's not something quick to do but it can be done - maybe
something good for inclusion within the 0.12 trunk later in the year. 
At this stage, this feature can't be included in the first pre-release
version (0.12.0p1) but I can plan it along the rocky way to the first
official release, 0.12.0. Maybe already in 0.12.0p2. How does it sound?

Let me spend a couple of words on a different aspect: the above approach
implies everything ends in the same SQL table - which can have pros and
cons; the pro is simplicity (one table for everything); the con is that
might want to have sub-aggregated data clearly separated into a different
table to, say, apply different policies. This is something can be done
today with pmacct as 'sql_preprocess' offers also the "max" version of
the "min" features you are using. It means having, for example, two SQL
plugins, writing to different SQL tables, aggregating data differently
and using complementary sql_preprocess features (so that at the end by
summing data in both tables one ends with the full picture). Would this
be a feasible approach to you?

Cheers,
Paolo

___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists


Re: [pmacct-discussion] Flexible aggregation

2009-06-13 Thread Chris Wilson
Hi Paolo,

On Sat, 13 Jun 2009, Paolo Lucente wrote:

> Good pointer. From a brief scan of the Aguri homepage, please feel free 
> to correct whether i'm wrong, i see many similarities between pmacct and 
> Aguri.

I guess so; I was thinking that Aguri seems to store its output in text 
files rather than a database, and perhaps provides more dynamic/automatic 
filtering, but seems to be a research project and not highly supported or 
maintained.

> Aguri is slightly more limited in the fact it has only a set of (4?) 
> traffic aggregation profiles whereas pmacct offers a wider range of 
> primitives. But I guess the point you wanted to make was the dynamic 
> variation of the sampling rate under increased traffic load (ie. DDoS).

OK, I didn't realise that it was just the sample rate that was varied. I 
thought it was to do with the flexible aggregation, e.g. if we have 1000 
flows with the same source IP and source port, they might be aggregated 
together as a single, more highly summarised flow.

> pmacct actually does have such feature only available to the SQL
> plugins: it's part of the SQL preprocess infrastructure (look for 
> 'sql_preprocess' in the CONFIG-KEYS document or the wiki) and is
> called 'fsrc' (Flow Sampling under Resource Constraints). It is
> an implementation i did years ago loosely based on a paper coming
> from AT&T Labs. It aims at offering to the SQL database a sort of
> stream-lined number of aggregates; aggregates are weighted, ranked
> and sampled based on probability (which gives the dynamic/adaptive
> part of the approach); the resource constraint is expressed via
> the number of flows you want to end in the database (which is in
> turn seen as the constrained resource here).

We are using this feature to filter out small flows, but the problem is 
that they are not accounted for at all, so the database contents e.g. 
SUM(bytes) no longer reflect the interface totals.

What I would ideally like to see, but I realise that it's hard is 
something like this:

Initial filter selects flows over a certain size and non-selected flows 
can either be discarded (as now) or reaggregated by zeroing a selected 
feature, e.g. the destination port, and combined into a new single record 
if there is more than one of them. These, more highly aggregated records 
then continue down the preprocess chain, and if they fail to match a later 
condition then they can be aggregated again in a different way, e.g. by 
zeroing the destination IP address, and so on, until we end up with a 
single record where all the features were aggregated.

For example, sql_preprocess might look something like this:

minb = 1, zero_dstip, minb = 1, zero_dstport, minb = 1, 
zero_srcport, minb = 1, zero_srcip

Then any flows which together do not add up to enough bytes to pass the 
minb filters, even after aggregation, end up in a record where all the 
selector fields are zeroed out. Since there is no final minb condition, 
this row would always be added to the database, never rejected, so 
SUM(bytes) would again equal the interface counters for any given time 
range.

Cheers, Chris.
-- 
Aptivate | http://www.aptivate.org | Phone: +44 1223 760887
The Humanitarian Centre, Fenner's, Gresham Road, Cambridge CB1 2ES

Aptivate is a not-for-profit company registered in England and Wales
with company number 04980791.

___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists


Re: [pmacct-discussion] input/output interface information in nfacctd?

2009-06-13 Thread Paolo Lucente
Hi Suraj,

On Fri, Jun 12, 2009 at 02:55:58PM -0400, Suraj Nellikar (snellika) wrote:

> If I am understanding correctly here, the "in" and the "out" (input and
> output interface) values defined in the pre-tag are the snmp index
> values of the interfaces on the switch/router right? I see that the "in"
> and "out" being used here are defined as integers whereas the interfaces
> being exported are hex values. Can we change the data type in pmacct?

Yes, that's correct. But how do you mean by interfaces are "exported"
as hex values?

> We also know that Netflow v9 is customizable. i.e I can define my own
> set of statistics to be exported to the collector. Is there a way to
> define the customized set in the pmacct (nfacct), so that nfacctd
> identifies those packets accordingly and stores it in the sql database?
> Pre-tagging supports only a subset of these. 

Good point. I assume you mean you don't want to code in the customised
NetFlow v9 information (this would be easy given the modularity of this
part of the code) but rather you want to see in pmacct a way to define
customized information, via a configuration file, and then possibly
match them through the Pre-Tagging infrastructure. 

If this is the case, unfortunately, while being a good idea this is
not currently possible; i thought several times about it, not only in
terms of Pre-Tagging primitives but also more importantly in terms of
aggregation primitives. This is definitely in the radar but i have no
immediate plans of implementing this, being busy on other juicy things
for the upcoming 0.12 trunk. But hey, contributions are warmly welcome!

Cheers,
Paolo


> -Original Message-
> From: pmacct-discussion-boun...@pmacct.net
> [mailto:pmacct-discussion-boun...@pmacct.net] On Behalf Of Karl O. Pinc
> Sent: Monday, June 08, 2009 6:35 PM
> To: Paolo Lucente; pmacct-discussion@pmacct.net
> Subject: Re: [pmacct-discussion] input/output interface informationin
> nfacctd?
> 
> 
> On 06/08/2009 06:00:46 PM, Paolo Lucente wrote:
> > Hi Suraj,
> > 
> > This information is not immediately available within the
> > database or memory table; but you can match such fields
> > within the Pre-Tagging infrastructure to generate a tag
> > - which can, in turn, be either just used internally for
> > filtering or splitting data among the plugins or can be
> > made available to the backend via the 'tag' aggregation
> > primitive.
> 
> Good summary of what tagging is for.  Is there a place
> to drop it on the wiki?
> 
> > 
> > See 'examples/pretag.map.example' in the pmacct tarball;
> > also pre_tag_map and pre_tag_filter in the CONFIG-KEYS
> > document.
> > 
> > Cheers,
> > Paolo
> > 
> > 
> > On Mon, Jun 08, 2009 at 06:07:56PM -0400, Suraj Nellikar (snellika)
> > wrote:
> > > Hi,
> > >
> > > Doesn't the pmacct collector (I am running nfacctd) provide the
> > Input
> > > Interface,Output Interface information which is present in the
> > Netflow
> > > packets?
> > >
> > >
> > >
> > > Thanks,
> > >
> > > Suraj.N
> > >
> > 
> > ___
> > pmacct-discussion mailing list
> > http://www.pmacct.net/#mailinglists
> > 
> > 
> 
> 
> Karl 
> Free Software:  "You don't pay back, you pay forward."
>   -- Robert A. Heinlein
> 
> ___
> pmacct-discussion mailing list
> http://www.pmacct.net/#mailinglists

___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists


Re: [pmacct-discussion] Flexible aggregation

2009-06-13 Thread Paolo Lucente
Hi Chris,

Good pointer. From a brief scan of the Aguri homepage, please feel
free to correct whether i'm wrong, i see many similarities between
pmacct and Aguri. 

Aguri is slightly more limited in the fact it has only a set of
(4?) traffic aggregation profiles whereas pmacct offers a wider
range of primitives. But I guess the point you wanted to make was
the dynamic variation of the sampling rate under increased traffic
load (ie. DDoS).

pmacct actually does have such feature only available to the SQL
plugins: it's part of the SQL preprocess infrastructure (look for 
'sql_preprocess' in the CONFIG-KEYS document or the wiki) and is
called 'fsrc' (Flow Sampling under Resource Constraints). It is
an implementation i did years ago loosely based on a paper coming
from AT&T Labs. It aims at offering to the SQL database a sort of
stream-lined number of aggregates; aggregates are weighted, ranked
and sampled based on probability (which gives the dynamic/adaptive
part of the approach); the resource constraint is expressed via
the number of flows you want to end in the database (which is in
turn seen as the constrained resource here).

Let me add: years ago i found it working for me, but perhaps it
lacks of thorough testing; anybody reading this email actually
using this feature (or did in the past) and is able to provide
feedback? 

Thoughts?

Cheers,
Paolo

On Fri, Jun 12, 2009 at 09:42:22AM +0300, Chris Wilson wrote:
> Hi all,
> 
> Has anyone heard of Aguri?
> 
> "Aguri is an aggregation-based traffic profiler targeted for near 
> real-time, long-term, and wide-area traffic monitoring. Aguri adapts 
> itself to spatial traffic distribution by aggregating small volume flows 
> into aggregates, and achieves temporal aggregation by creating a summary 
> of summaries applying the same algorithm to its outputs. A set of scripts 
> are used for archiving and visualizing summaries in different time scales. 
> Aguri does not need a predefined rule set and is capable of detecting an 
> unexpected increase of unknown protocols or DoS attacks, which 
> considerably simplifies the task of network monitoring."
> 
> [http://www.sonycsl.co.jp/person/kjc/kjc/software.html]
> 
> I think I remember something like this  being posted to the list a while 
> back, so I'm sorry if this is a duplicate.
> 
> Has anyone considered implementing anything like this flexible aggregation 
> in pmacct? Could the code be taken from Aguri under BSD license?
> 
> Cheers, Chris.
> -- 
> Aptivate | http://www.aptivate.org | Phone: +44 1223 760887
> The Humanitarian Centre, Fenner's, Gresham Road, Cambridge CB1 2ES
> 
> Aptivate is a not-for-profit company registered in England and Wales
> with company number 04980791.
> 
> ___
> pmacct-discussion mailing list
> http://www.pmacct.net/#mailinglists

___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists