Re: [collectd] RFC: Changes to data sources and naming schema
On 24/09/2013, at 3:21 PM, Florian Forster o...@collectd.org wrote: Hi Jesse, On Tue, Sep 24, 2013 at 10:04:20AM +0930, Jesse Reynolds wrote: 2) - I'm slightly confused ... can you give an example of how you'd do a disk utilisation threshold check? either the current absolute value needs to be known, or one at some previous point in time and the current value calculated from that. The same way you're doing it now, really. It can't all be just rate data. I think here lies the misunderstanding: The proposal is not to make everything a rate, but make everything a gauge. For counters, that means converting them to a rate. Since nothing changes for gauges at all, I didn't spent any time discussing them. They are *not* going away and we do not propose to change partition usage or temperatures to rates. Ahh! Excellent, thank you for clarifying that. Carry on then :-) Nothing to see here. ___ collectd mailing list collectd@verplant.org http://mailman.verplant.org/listinfo/collectd
Re: [collectd] RFC: Changes to data sources and naming schema
Hi list, First of all, shout-out to octo and jeremy katz for making the hackathon happen, great stuff and a great opportunity to meet you all. Here are my answers and comments: 1) OK. Fully in favor, I don't think the extra disk space will be much of a problem, it will greatly simplify the API. 2) OK. My gut initially said no, but rather because I hadn't wrapped my head around the fact that gauge still was there and provided all necessary information. I churned to find use cases where this would be interesting to have. 3) I'm am strongly in favor of solution 2, because it is the one that would allow the most flexible way of interacting with other outputs than rrdtool and graphite. Resolving to something ressembling a path name is a task that concerns mostly: - the csv output plugin - the rrd output plugin - the write_graphite output plugin I think there is a way to make this work out for these plugins as well as discussed saturday. The proposed way of doing it was to have plugins hint at the way a name could be construed. The clear advantage of this is approach is that an internal mangling DSL could use the fields and it would ease interop with tools such as riemann, logstash or librato. Serialisation is another debate :) Cheers, - pyr On Mon, Sep 23, 2013 at 8:12 PM, Florian Forster o...@collectd.org wrote: [TLDR: Do you have a use-case for raw counter values?] Good morning everybody, we had a great time at the Hackathon [0] in Berlin yesterday. Thanks again to everyone! Amongst the ideas we discussed were some fundamental changes to the way metrics are represented. These ideas might eventually result in a collectd version 6, but hold you breath just yet – no actual coding has been done in that direction, we're just collecting design ideas at the moment. 1) Get rid of multiple data sources per metric. Some metrics, e.g. the if_octets metrics from the interface plugin and the load metric from the load plugin have multiple data sources. The if_octets metrics has data sources rx and tx for received and transmitted bytes. We would like to remove this functionality altogether. Rather than one metric with two values, we would like the interface plugin to create two metrics with one value each. Since version 5.0 this is mostly how metrics are defined and only few cases are left, now we would like to actually remove the functionality. We reached a consensus on this so it's essentially a done deal. Pro: * A lot of collectd code becomes a lot easier (less bugs) * A lot of front-end and graphing code becomes a lot easier (more and better front-ends) * Mapping of collectd metrics to names used by other systems, e.g. Graphite, is easier / more consistent * Splitting up existing RRD files by data source is a solved problem; writing a migration script is fairly simple * A point which causes much confusion for new users is resolved Contra: * Building a backwards compatibility layer for this is going to be hard 2) Calculate the rate of counters / DERIVEs early on and after that only handle gauge values. Right now, values come in four flavors: GAUGE and DERIVE, and two more special cases which are hardly ever used. These numbers are passed through the daemon as they are, i.e.: * The CPU plugin gets a counter of how many ticks / jiffies the CPU has spent in user mode since some unspecified time in the past. * This number if dispatched as a DERIVE type value. * The output plugins will write this absolute number. However, in the case of DERIVE (and COUNTER) values these actual absolute numbers are meaningless. In order to do anything meaningful with them, the difference between two values (and their respective times) is calculated, which results in the averaged _rate_ of change. This is what output plugins do if they have an enabled StoreRates setting. But not only there: Threshold checking, scaling, aggregation; all of these operate on the _rate_ rather than the absolute number. We would like to change the way DERIVEs are handled within collectd: Instead of keeping the original absolute values, we would like to calculate the rate as early as possible, possibly within the read plugins, and only handle the rate form there on. We only came up with one use case where having the raw counter values is beneficial: If you want to calculate the average rate over arbitrary time spans, it's easier to look up the raw counter values for those points in time and go from there. However, you can also sum up the individual rates to reach the same result. Finally, when handling counter resets / overflows within this interval, integrating over / summing rates is trivial by comparison. Do you have any other use-case for raw counter values? Pro: * Handling of values becomes easier. * The rate is calculated only once, in contrast to potentially several times, which might be more efficient
Re: [collectd] RFC: Changes to data sources and naming schema
It's a good idea to change this naming schema. I like the first alternative a path, really simple, and easy to draw, group ... I don't understand how you can have a JSON object with RRD/Filesystem, will you go on a only nosql storage ? If you use a path, types.db can be limited to derive/counter, the labels can be in the path himself or in the filename, like this we will not have to deploy it on all nodes when we need a new type Today I have to hack it for GenericJMX, Curl, some python code ... I'm using this (https://github.com/Poil/CGraphz/wiki/CGraphz%20Naming%20Schema) : * host * plugin * plugin category (custom optional) * plugin instance (optional) * type * type category (custom optional) * type instance (optional) PluginCategory is used to separate GenericJMX|varnish|curl_json|curl|curl_xml|P2000|tcpconns TypeCategory is used to separate some customplugins (GenericJMX|elasticsearch|P2000) Regards, ___ collectd mailing list collectd@verplant.org http://mailman.verplant.org/listinfo/collectd
Re: [collectd] RFC: Changes to data sources and naming schema
resending, forgot list On Mon, Sep 23, 2013 at 9:54 PM, Pierre-Yves Ritschard p...@spootnik.orgwrote: The idea is to have a simple way of naming things, the identity of a metric will be defined by the keys in the attr section. To generate path names for graphite or rrd plugins, the write output plugin would look for special expected keys (source, metric). Input plugins could additionally hint at the way to format their name, by specifying a list of keys to look up (e.g: format: [ source, cpu-type, cpu-id]). This would actually make the metric names of such plugins as IPMI or GenericJMX much cleaner, especially with graphite since right now it's a mess of arbitrary length trees On Mon, Sep 23, 2013 at 9:45 PM, Poil p...@quake.fr wrote: It's a good idea to change this naming schema. I like the first alternative a path, really simple, and easy to draw, group ... I don't understand how you can have a JSON object with RRD/Filesystem, will you go on a only nosql storage ? If you use a path, types.db can be limited to derive/counter, the labels can be in the path himself or in the filename, like this we will not have to deploy it on all nodes when we need a new type Today I have to hack it for GenericJMX, Curl, some python code ... I'm using this ( https://github.com/Poil/CGraphz/wiki/CGraphz%20Naming%20Schema) : - host - plugin - plugin category (custom optional) - plugin instance (optional) - type - type category (custom optional) - type instance (optional) PluginCategory is used to separate GenericJMX|varnish|curl_json|curl|curl_xml|P2000|tcpconns TypeCategory is used to separate some customplugins (GenericJMX|elasticsearch|P2000) Regards, ___ collectd mailing list collectd@verplant.org http://mailman.verplant.org/listinfo/collectd ___ collectd mailing list collectd@verplant.org http://mailman.verplant.org/listinfo/collectd
Re: [collectd] RFC: Changes to data sources and naming schema
1) - yay :-) 2) - I'm slightly confused ... can you give an example of how you'd do a disk utilisation threshold check? either the current absolute value needs to be known, or one at some previous point in time and the current value calculated from that. It can't all be just rate data. Or are we proposing that 'dumb' threshold checks like this are old-hat and an annoyance? :-) ... Even still, probably the ideal disk utilisation check would consider both rate and current absolute value. 3) option 2 (unordered key-value pairs) seems most flexible and ideal in terms of constructing interesting views of the data Nice work hackathoners :-) Cheers Jesse ___ collectd mailing list collectd@verplant.org http://mailman.verplant.org/listinfo/collectd
Re: [collectd] RFC: Changes to data sources and naming schema
Hi Jesse, On Tue, Sep 24, 2013 at 10:04:20AM +0930, Jesse Reynolds wrote: 2) - I'm slightly confused ... can you give an example of how you'd do a disk utilisation threshold check? either the current absolute value needs to be known, or one at some previous point in time and the current value calculated from that. The same way you're doing it now, really. It can't all be just rate data. I think here lies the misunderstanding: The proposal is not to make everything a rate, but make everything a gauge. For counters, that means converting them to a rate. Since nothing changes for gauges at all, I didn't spent any time discussing them. They are *not* going away and we do not propose to change partition usage or temperatures to rates. Best regards, —octo -- collectd – The system statistics collection daemon Website: http://collectd.org Google+: http://collectd.org/+ GitHub: https://github.com/collectd Twitter: http://twitter.com/collectd signature.asc Description: Digital signature ___ collectd mailing list collectd@verplant.org http://mailman.verplant.org/listinfo/collectd