Re: [pmacct-discussion] Flexible aggregation
On 06/13/2009 03:49:07 PM, Paolo Lucente wrote: Hi Chris, On Sat, Jun 13, 2009 at 03:07:01PM -0500, Karl O. Pinc wrote: >> We are only interested in a single table. > > Why can't two separate sql plugins write to the same table? What Karl is proposing here might really result in a simpler approach compared to the sub-aggregation scenario - which, with some care (ie. sql_startup_delay to svoid events syncronization while retaining same sql_history and sql_refresh_time settings), can not only achieve same results but best of all is already there. Let us know your thoughts! A good database should not have problems with simultaneous updates, or is there another reason why synchronization is an issue? If database keys are a problem that should be easily dealt with by adding another (optional) config parameter to specify a unique part of a key. (Although good database design says that you don't put meaningful information into a key, which makes the key issue moot but would still require another database column to track the source (plugid "id") of the data.) Karl Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein ___ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists
Re: [pmacct-discussion] Flexible aggregation
Hi Chris, On Sat, Jun 13, 2009 at 03:07:01PM -0500, Karl O. Pinc wrote: >> We are only interested in a single table. > > Why can't two separate sql plugins write to the same table? What Karl is proposing here might really result in a simpler approach compared to the sub-aggregation scenario - which, with some care (ie. sql_startup_delay to svoid events syncronization while retaining same sql_history and sql_refresh_time settings), can not only achieve same results but best of all is already there. Let us know your thoughts! Cheers, Paolo ___ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists
Re: [pmacct-discussion] Flexible aggregation
Hi Karl, On Sat, Jun 13, 2009 at 03:03:04PM -0500, Karl O. Pinc wrote: > What really is the constrained resource here? Is it the number > of transactions the database supports or is it something more > fundamental to the pmacct performance like cpu or memory > constraints? What I'm thinking is that past a certain > point having the system do more work to detect that it's > under load becomes counter-productive. Is that why > you've not implemented such a feature for the in-memory > tables or is it simply because it's up to the pmacct > command to determine sample frequency? > > Just curious. The constrained resource is the SQL database; not pmacct itself, the cache size, either CPU or memory usage. This is because the database performance is sensitive to rate of insertion, number of tuples, index performance, etc. A fundmental difference between the two backends is that the SQL database is conceived to keep historical data (by featuring timestamps) whereas memory tables typical usage sees them being saved somewhere persistently (RRD files, fed into 3rd party scripts/applications, etc.) and being wiped out at regular intervals. That is why such a thing was not really implemented for in-memory tables: after understanding the footprint of the memory table in, say, a lab environment, it's not a great deal to keep it healthy: it simply keeps working smoothly as long as the hashing algorithm is able to disperse data through the available buckets. Cheers, Paolo ___ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists
Re: [pmacct-discussion] Flexible aggregation
On 06/13/2009 09:58:17 AM, Chris Wilson wrote: Hi Paolo, On Sat, 13 Jun 2009, Paolo Lucente wrote: > Let me spend a couple of words on a different aspect: the above approach > implies everything ends in the same SQL table - which can have pros and > cons; the pro is simplicity (one table for everything); the con is that > might want to have sub-aggregated data clearly separated into a > different table to, say, apply different policies. This is something can > be done today with pmacct as 'sql_preprocess' offers also the "max" > version of the "min" features you are using. It means having, for > example, two SQL plugins, writing to different SQL tables, aggregating > data differently and using complementary sql_preprocess features (so > that at the end by summing data in both tables one ends with the full > picture). Would this be a feasible approach to you? We are only interested in a single table. Why can't two separate sql plugins write to the same table? Karl Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein ___ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists
Re: [pmacct-discussion] Flexible aggregation
On 06/13/2009 05:11:40 AM, Paolo Lucente wrote: Hi Chris, Aguri is slightly more limited in the fact it has only a set of (4?) traffic aggregation profiles whereas pmacct offers a wider range of primitives. But I guess the point you wanted to make was the dynamic variation of the sampling rate under increased traffic load (ie. DDoS). pmacct actually does have such feature only available to the SQL plugins: it's part of the SQL preprocess infrastructure (look for 'sql_preprocess' in the CONFIG-KEYS document or the wiki) and is called 'fsrc' (Flow Sampling under Resource Constraints). It is an implementation i did years ago loosely based on a paper coming from AT&T Labs. It aims at offering to the SQL database a sort of stream-lined number of aggregates; aggregates are weighted, ranked and sampled based on probability (which gives the dynamic/adaptive part of the approach); the resource constraint is expressed via the number of flows you want to end in the database (which is in turn seen as the constrained resource here). What really is the constrained resource here? Is it the number of transactions the database supports or is it something more fundamental to the pmacct performance like cpu or memory constraints? What I'm thinking is that past a certain point having the system do more work to detect that it's under load becomes counter-productive. Is that why you've not implemented such a feature for the in-memory tables or is it simply because it's up to the pmacct command to determine sample frequency? Just curious. Regards, Karl Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein ___ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists
Re: [pmacct-discussion] Flexible aggregation
Hi Paolo, On Sat, 13 Jun 2009, Paolo Lucente wrote: > > minb = 1, zero_dstip, minb = 1, zero_dstport, minb = 1, > > zero_srcport, minb = 1, zero_srcip > > > > Then any flows which together do not add up to enough bytes to pass > > the minb filters, even after aggregation, end up in a record where all > > the selector fields are zeroed out. Since there is no final minb > > condition, this row would always be added to the database, never > > rejected, so SUM(bytes) would again equal the interface counters for > > any given time range. > > I explored this valid approach some time ago (years!); by zeroing some > aggregation primitives previously selected, duplicates are likely to be > created. The trick is to "resolve" such duplicates before offering them > to the SQL database - via a sub-aggregation operation. The cache is not > sorted - making any sub-aggregation operation very expensive (scaling > linearly with the number of aggregated being offered); the idea here is > to index the cache, perform the sub-aggregation and offer the result of > this to the SQL database. I agree that merging duplicate records would produce the most useful results for us. > In summary, it's not something quick to do but it can be done - maybe > something good for inclusion within the 0.12 trunk later in the year. At > this stage, this feature can't be included in the first pre-release > version (0.12.0p1) but I can plan it along the rocky way to the first > official release, 0.12.0. Maybe already in 0.12.0p2. How does it sound? That sounds great! I was not expecting you to offer to implement it so quickly. I understand that it's difficult and may conflict with your other priorities. > Let me spend a couple of words on a different aspect: the above approach > implies everything ends in the same SQL table - which can have pros and > cons; the pro is simplicity (one table for everything); the con is that > might want to have sub-aggregated data clearly separated into a > different table to, say, apply different policies. This is something can > be done today with pmacct as 'sql_preprocess' offers also the "max" > version of the "min" features you are using. It means having, for > example, two SQL plugins, writing to different SQL tables, aggregating > data differently and using complementary sql_preprocess features (so > that at the end by summing data in both tables one ends with the full > picture). Would this be a feasible approach to you? We are only interested in a single table. We can show "0.0.0.0" as "Aggregated out" in the pmGraph user interface. I'd rather that we didn't have to query five separate tables to get the results at different levels of aggregation, and merge them all together in our code. However I can see that some people would prefer to keep them in separate tables. Cheers, Chris. -- Aptivate | http://www.aptivate.org | Phone: +44 1223 760887 The Humanitarian Centre, Fenner's, Gresham Road, Cambridge CB1 2ES Aptivate is a not-for-profit company registered in England and Wales with company number 04980791. ___ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists
Re: [pmacct-discussion] Flexible aggregation
Hi Chris, On Sat, Jun 13, 2009 at 02:08:56PM +0300, Chris Wilson wrote: > I guess so; I was thinking that Aguri seems to store its output in text > files rather than a database, and perhaps provides more dynamic/automatic > filtering, but seems to be a research project and not highly supported or > maintained. It seems it does this summary of summaries; such aggregate summaries seems to be produced upon receipt by the application of an HUP signal. It sounds like the application saves down everything to its maximum resolution, say with all primitives enabled, but then as a "frontend" feature (presenting statistics) is able to aggregate it. > We are using this feature to filter out small flows, but the problem is > that they are not accounted for at all, so the database contents e.g. > SUM(bytes) no longer reflect the interface totals. > > What I would ideally like to see, but I realise that it's hard is > something like this: > > Initial filter selects flows over a certain size and non-selected flows > can either be discarded (as now) or reaggregated by zeroing a selected > feature, e.g. the destination port, and combined into a new single record > if there is more than one of them. These, more highly aggregated records > then continue down the preprocess chain, and if they fail to match a later > condition then they can be aggregated again in a different way, e.g. by > zeroing the destination IP address, and so on, until we end up with a > single record where all the features were aggregated. > > For example, sql_preprocess might look something like this: > > minb = 1, zero_dstip, minb = 1, zero_dstport, minb = 1, > zero_srcport, minb = 1, zero_srcip > > Then any flows which together do not add up to enough bytes to pass the > minb filters, even after aggregation, end up in a record where all the > selector fields are zeroed out. Since there is no final minb condition, > this row would always be added to the database, never rejected, so > SUM(bytes) would again equal the interface counters for any given time > range. I explored this valid approach some time ago (years!); by zeroing some aggregation primitives previously selected, duplicates are likely to be created. The trick is to "resolve" such duplicates before offering them to the SQL database - via a sub-aggregation operation. The cache is not sorted - making any sub-aggregation operation very expensive (scaling linearly with the number of aggregated being offered); the idea here is to index the cache, perform the sub-aggregation and offer the result of this to the SQL database. In summary, it's not something quick to do but it can be done - maybe something good for inclusion within the 0.12 trunk later in the year. At this stage, this feature can't be included in the first pre-release version (0.12.0p1) but I can plan it along the rocky way to the first official release, 0.12.0. Maybe already in 0.12.0p2. How does it sound? Let me spend a couple of words on a different aspect: the above approach implies everything ends in the same SQL table - which can have pros and cons; the pro is simplicity (one table for everything); the con is that might want to have sub-aggregated data clearly separated into a different table to, say, apply different policies. This is something can be done today with pmacct as 'sql_preprocess' offers also the "max" version of the "min" features you are using. It means having, for example, two SQL plugins, writing to different SQL tables, aggregating data differently and using complementary sql_preprocess features (so that at the end by summing data in both tables one ends with the full picture). Would this be a feasible approach to you? Cheers, Paolo ___ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists
Re: [pmacct-discussion] Flexible aggregation
Hi Paolo, On Sat, 13 Jun 2009, Paolo Lucente wrote: > Good pointer. From a brief scan of the Aguri homepage, please feel free > to correct whether i'm wrong, i see many similarities between pmacct and > Aguri. I guess so; I was thinking that Aguri seems to store its output in text files rather than a database, and perhaps provides more dynamic/automatic filtering, but seems to be a research project and not highly supported or maintained. > Aguri is slightly more limited in the fact it has only a set of (4?) > traffic aggregation profiles whereas pmacct offers a wider range of > primitives. But I guess the point you wanted to make was the dynamic > variation of the sampling rate under increased traffic load (ie. DDoS). OK, I didn't realise that it was just the sample rate that was varied. I thought it was to do with the flexible aggregation, e.g. if we have 1000 flows with the same source IP and source port, they might be aggregated together as a single, more highly summarised flow. > pmacct actually does have such feature only available to the SQL > plugins: it's part of the SQL preprocess infrastructure (look for > 'sql_preprocess' in the CONFIG-KEYS document or the wiki) and is > called 'fsrc' (Flow Sampling under Resource Constraints). It is > an implementation i did years ago loosely based on a paper coming > from AT&T Labs. It aims at offering to the SQL database a sort of > stream-lined number of aggregates; aggregates are weighted, ranked > and sampled based on probability (which gives the dynamic/adaptive > part of the approach); the resource constraint is expressed via > the number of flows you want to end in the database (which is in > turn seen as the constrained resource here). We are using this feature to filter out small flows, but the problem is that they are not accounted for at all, so the database contents e.g. SUM(bytes) no longer reflect the interface totals. What I would ideally like to see, but I realise that it's hard is something like this: Initial filter selects flows over a certain size and non-selected flows can either be discarded (as now) or reaggregated by zeroing a selected feature, e.g. the destination port, and combined into a new single record if there is more than one of them. These, more highly aggregated records then continue down the preprocess chain, and if they fail to match a later condition then they can be aggregated again in a different way, e.g. by zeroing the destination IP address, and so on, until we end up with a single record where all the features were aggregated. For example, sql_preprocess might look something like this: minb = 1, zero_dstip, minb = 1, zero_dstport, minb = 1, zero_srcport, minb = 1, zero_srcip Then any flows which together do not add up to enough bytes to pass the minb filters, even after aggregation, end up in a record where all the selector fields are zeroed out. Since there is no final minb condition, this row would always be added to the database, never rejected, so SUM(bytes) would again equal the interface counters for any given time range. Cheers, Chris. -- Aptivate | http://www.aptivate.org | Phone: +44 1223 760887 The Humanitarian Centre, Fenner's, Gresham Road, Cambridge CB1 2ES Aptivate is a not-for-profit company registered in England and Wales with company number 04980791. ___ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists
Re: [pmacct-discussion] input/output interface information in nfacctd?
Hi Suraj, On Fri, Jun 12, 2009 at 02:55:58PM -0400, Suraj Nellikar (snellika) wrote: > If I am understanding correctly here, the "in" and the "out" (input and > output interface) values defined in the pre-tag are the snmp index > values of the interfaces on the switch/router right? I see that the "in" > and "out" being used here are defined as integers whereas the interfaces > being exported are hex values. Can we change the data type in pmacct? Yes, that's correct. But how do you mean by interfaces are "exported" as hex values? > We also know that Netflow v9 is customizable. i.e I can define my own > set of statistics to be exported to the collector. Is there a way to > define the customized set in the pmacct (nfacct), so that nfacctd > identifies those packets accordingly and stores it in the sql database? > Pre-tagging supports only a subset of these. Good point. I assume you mean you don't want to code in the customised NetFlow v9 information (this would be easy given the modularity of this part of the code) but rather you want to see in pmacct a way to define customized information, via a configuration file, and then possibly match them through the Pre-Tagging infrastructure. If this is the case, unfortunately, while being a good idea this is not currently possible; i thought several times about it, not only in terms of Pre-Tagging primitives but also more importantly in terms of aggregation primitives. This is definitely in the radar but i have no immediate plans of implementing this, being busy on other juicy things for the upcoming 0.12 trunk. But hey, contributions are warmly welcome! Cheers, Paolo > -Original Message- > From: pmacct-discussion-boun...@pmacct.net > [mailto:pmacct-discussion-boun...@pmacct.net] On Behalf Of Karl O. Pinc > Sent: Monday, June 08, 2009 6:35 PM > To: Paolo Lucente; pmacct-discussion@pmacct.net > Subject: Re: [pmacct-discussion] input/output interface informationin > nfacctd? > > > On 06/08/2009 06:00:46 PM, Paolo Lucente wrote: > > Hi Suraj, > > > > This information is not immediately available within the > > database or memory table; but you can match such fields > > within the Pre-Tagging infrastructure to generate a tag > > - which can, in turn, be either just used internally for > > filtering or splitting data among the plugins or can be > > made available to the backend via the 'tag' aggregation > > primitive. > > Good summary of what tagging is for. Is there a place > to drop it on the wiki? > > > > > See 'examples/pretag.map.example' in the pmacct tarball; > > also pre_tag_map and pre_tag_filter in the CONFIG-KEYS > > document. > > > > Cheers, > > Paolo > > > > > > On Mon, Jun 08, 2009 at 06:07:56PM -0400, Suraj Nellikar (snellika) > > wrote: > > > Hi, > > > > > > Doesn't the pmacct collector (I am running nfacctd) provide the > > Input > > > Interface,Output Interface information which is present in the > > Netflow > > > packets? > > > > > > > > > > > > Thanks, > > > > > > Suraj.N > > > > > > > ___ > > pmacct-discussion mailing list > > http://www.pmacct.net/#mailinglists > > > > > > > Karl > Free Software: "You don't pay back, you pay forward." > -- Robert A. Heinlein > > ___ > pmacct-discussion mailing list > http://www.pmacct.net/#mailinglists ___ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists
Re: [pmacct-discussion] Flexible aggregation
Hi Chris, Good pointer. From a brief scan of the Aguri homepage, please feel free to correct whether i'm wrong, i see many similarities between pmacct and Aguri. Aguri is slightly more limited in the fact it has only a set of (4?) traffic aggregation profiles whereas pmacct offers a wider range of primitives. But I guess the point you wanted to make was the dynamic variation of the sampling rate under increased traffic load (ie. DDoS). pmacct actually does have such feature only available to the SQL plugins: it's part of the SQL preprocess infrastructure (look for 'sql_preprocess' in the CONFIG-KEYS document or the wiki) and is called 'fsrc' (Flow Sampling under Resource Constraints). It is an implementation i did years ago loosely based on a paper coming from AT&T Labs. It aims at offering to the SQL database a sort of stream-lined number of aggregates; aggregates are weighted, ranked and sampled based on probability (which gives the dynamic/adaptive part of the approach); the resource constraint is expressed via the number of flows you want to end in the database (which is in turn seen as the constrained resource here). Let me add: years ago i found it working for me, but perhaps it lacks of thorough testing; anybody reading this email actually using this feature (or did in the past) and is able to provide feedback? Thoughts? Cheers, Paolo On Fri, Jun 12, 2009 at 09:42:22AM +0300, Chris Wilson wrote: > Hi all, > > Has anyone heard of Aguri? > > "Aguri is an aggregation-based traffic profiler targeted for near > real-time, long-term, and wide-area traffic monitoring. Aguri adapts > itself to spatial traffic distribution by aggregating small volume flows > into aggregates, and achieves temporal aggregation by creating a summary > of summaries applying the same algorithm to its outputs. A set of scripts > are used for archiving and visualizing summaries in different time scales. > Aguri does not need a predefined rule set and is capable of detecting an > unexpected increase of unknown protocols or DoS attacks, which > considerably simplifies the task of network monitoring." > > [http://www.sonycsl.co.jp/person/kjc/kjc/software.html] > > I think I remember something like this being posted to the list a while > back, so I'm sorry if this is a duplicate. > > Has anyone considered implementing anything like this flexible aggregation > in pmacct? Could the code be taken from Aguri under BSD license? > > Cheers, Chris. > -- > Aptivate | http://www.aptivate.org | Phone: +44 1223 760887 > The Humanitarian Centre, Fenner's, Gresham Road, Cambridge CB1 2ES > > Aptivate is a not-for-profit company registered in England and Wales > with company number 04980791. > > ___ > pmacct-discussion mailing list > http://www.pmacct.net/#mailinglists ___ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists