Re: [collectd] RFC: Changes to data sources and naming schema

2013-09-24 Thread Jesse Reynolds

On 24/09/2013, at 3:21 PM, Florian Forster o...@collectd.org wrote:

 Hi Jesse,
 
 On Tue, Sep 24, 2013 at 10:04:20AM +0930, Jesse Reynolds wrote:
 2) - I'm slightly confused ... can you give an example of how you'd do
 a disk utilisation threshold check? either the current absolute
 value needs to be known, or one at some previous point in time
 and the current value calculated from that.
 
 The same way you're doing it now, really.
 
 It can't all be just rate data.
 
 I think here lies the misunderstanding: The proposal is not to make
 everything a rate, but make everything a gauge. For counters, that
 means converting them to a rate. Since nothing changes for gauges at
 all, I didn't spent any time discussing them. They are *not* going away
 and we do not propose to change partition usage or temperatures to
 rates.

Ahh! Excellent, thank you for clarifying that. Carry on then :-) Nothing to see 
here. 



___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd


Re: [collectd] RFC: Changes to data sources and naming schema

2013-09-23 Thread Pierre-Yves Ritschard
Hi list,

First of all, shout-out to octo and jeremy katz for making the hackathon
happen, great stuff and a great opportunity to meet you all.

Here are my answers and comments:

1) OK. Fully in favor, I don't think the extra disk space will be much of a
problem, it will greatly simplify the API.
2) OK. My gut initially said no, but rather because I hadn't wrapped my
head around the fact that gauge still was there and provided all necessary
information. I churned to find use cases where this would be interesting to
have.
3) I'm am strongly in favor of solution 2, because it is the one that would
allow the most flexible way of interacting with other outputs than rrdtool
and graphite. Resolving to something ressembling a path name is a task that
concerns mostly:
  - the csv output plugin
  - the rrd output plugin
  - the write_graphite output plugin

I think there is a way to make this work out for these plugins as well as
discussed saturday.

The proposed way of doing it was to have plugins hint at the way a name
could be construed. The clear advantage of this is approach is that an
internal mangling DSL could use the fields and
it would ease interop with tools such as riemann, logstash or librato.

Serialisation is another debate :)

Cheers,
  - pyr


On Mon, Sep 23, 2013 at 8:12 PM, Florian Forster o...@collectd.org wrote:

 [TLDR: Do you have a use-case for raw counter values?]

 Good morning everybody,

 we had a great time at the Hackathon [0] in Berlin yesterday. Thanks
 again to everyone!

 Amongst the ideas we discussed were some fundamental changes to the way
 metrics are represented. These ideas might eventually result in a
 collectd version 6, but hold you breath just yet – no actual coding has
 been done in that direction, we're just collecting design ideas at the
 moment.


 1) Get rid of multiple data sources per metric.

 Some metrics, e.g. the if_octets metrics from the interface plugin
 and the load metric from the load plugin have multiple data
 sources. The if_octets metrics has data sources rx and tx for
 received and transmitted bytes.

 We would like to remove this functionality altogether. Rather than one
 metric with two values, we would like the interface plugin to create
 two metrics with one value each. Since version 5.0 this is mostly how
 metrics are defined and only few cases are left, now we would like to
 actually remove the functionality. We reached a consensus on this so
 it's essentially a done deal.

 Pro:

   * A lot of collectd code becomes a lot easier (less bugs)
   * A lot of front-end and graphing code becomes a lot easier (more
 and better front-ends)
   * Mapping of collectd metrics to names used by other systems,
 e.g. Graphite, is easier / more consistent
   * Splitting up existing RRD files by data source is a solved
 problem; writing a migration script is fairly simple
   * A point which causes much confusion for new users is resolved

 Contra:

   * Building a backwards compatibility layer for this is going to be
 hard


 2) Calculate the rate of counters / DERIVEs early on and after that only
handle gauge values.

 Right now, values come in four flavors: GAUGE and DERIVE, and two more
 special cases which are hardly ever used. These numbers are passed
 through the daemon as they are, i.e.:
   * The CPU plugin gets a counter of how many ticks / jiffies the CPU
 has spent in user mode since some unspecified time in the past.
   * This number if dispatched as a DERIVE type value.
   * The output plugins will write this absolute number.

 However, in the case of DERIVE (and COUNTER) values these actual
 absolute numbers are meaningless. In order to do anything meaningful
 with them, the difference between two values (and their respective
 times) is calculated, which results in the averaged _rate_ of change.
 This is what output plugins do if they have an enabled StoreRates
 setting. But not only there: Threshold checking, scaling, aggregation;
 all of these operate on the _rate_ rather than the absolute number.

 We would like to change the way DERIVEs are handled within collectd:
 Instead of keeping the original absolute values, we would like to
 calculate the rate as early as possible, possibly within the read
 plugins, and only handle the rate form there on.

 We only came up with one use case where having the raw counter values is
 beneficial: If you want to calculate the average rate over arbitrary
 time spans, it's easier to look up the raw counter values for those
 points in time and go from there. However, you can also sum up the
 individual rates to reach the same result. Finally, when handling
 counter resets / overflows within this interval, integrating over /
 summing rates is trivial by comparison.

 Do you have any other use-case for raw counter values?

 Pro:

   * Handling of values becomes easier.
   * The rate is calculated only once, in contrast to potentially several
 times, which might be more efficient 

Re: [collectd] RFC: Changes to data sources and naming schema

2013-09-23 Thread Poil

It's a good idea to change this naming schema.

I like the first alternative a path, really simple, and easy to draw, 
group ...
I don't understand how you can have a JSON object with RRD/Filesystem, 
will you go on a only nosql storage ?


If you use a path, types.db can be limited to derive/counter, the labels 
can be in the path himself or in the filename, like this we will not 
have to deploy it on all nodes when we need a new type


Today I have to hack it for GenericJMX, Curl, some python code ...
I'm using this 
(https://github.com/Poil/CGraphz/wiki/CGraphz%20Naming%20Schema) :


 * host
 * plugin
 * plugin category (custom optional)
 * plugin instance (optional)
 * type
 * type category (custom optional)
 * type instance (optional)

PluginCategory is used to separate 
GenericJMX|varnish|curl_json|curl|curl_xml|P2000|tcpconns
TypeCategory is used to separate some customplugins 
(GenericJMX|elasticsearch|P2000)


Regards,



___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd


Re: [collectd] RFC: Changes to data sources and naming schema

2013-09-23 Thread Pierre-Yves Ritschard
resending, forgot list


On Mon, Sep 23, 2013 at 9:54 PM, Pierre-Yves Ritschard p...@spootnik.orgwrote:

 The idea is to have a simple way of naming things, the identity of a
 metric will be defined by the keys in the attr section.
 To generate path names for graphite or rrd plugins, the write output
 plugin would look for special expected keys (source, metric). Input plugins
 could additionally hint at the way to format
 their name, by specifying a list of keys to look up (e.g: format: [
 source, cpu-type, cpu-id]).

 This would actually make the metric names of such plugins as IPMI or
 GenericJMX much cleaner, especially with graphite since right now it's a
 mess of arbitrary length trees



 On Mon, Sep 23, 2013 at 9:45 PM, Poil p...@quake.fr wrote:

  It's a good idea to change this naming schema.

 I like the first alternative a path, really simple, and easy to draw,
 group ...
 I don't understand how you can have a JSON object with RRD/Filesystem,
 will you go on a only nosql storage ?

 If you use a path, types.db can be limited to derive/counter, the labels
 can be in the path himself or in the filename, like this we will not have
 to deploy it on all nodes when we need a new type

 Today I have to hack it for GenericJMX, Curl, some python code ...
 I'm using this (
 https://github.com/Poil/CGraphz/wiki/CGraphz%20Naming%20Schema) :

- host
- plugin
- plugin category (custom optional)
- plugin instance (optional)
- type
- type category (custom optional)
- type instance (optional)

 PluginCategory is used to separate
 GenericJMX|varnish|curl_json|curl|curl_xml|P2000|tcpconns
 TypeCategory is used to separate some customplugins
 (GenericJMX|elasticsearch|P2000)

 Regards,




 ___
 collectd mailing list
 collectd@verplant.org
 http://mailman.verplant.org/listinfo/collectd



___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd


Re: [collectd] RFC: Changes to data sources and naming schema

2013-09-23 Thread Jesse Reynolds
1) - yay :-)

2) - I'm slightly confused ... can you give an example of how you'd do a disk 
utilisation threshold check? either the current absolute value needs to be 
known, or one at some previous point in time and the current value calculated 
from that. It can't all be just rate data. Or are we proposing that 'dumb' 
threshold checks like this are old-hat and an annoyance? :-) ... Even still, 
probably the ideal disk utilisation check would consider both rate and current 
absolute value. 

3) option 2 (unordered key-value pairs) seems most flexible and ideal in terms 
of constructing interesting views of the data

Nice work hackathoners :-)

Cheers
Jesse



___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd


Re: [collectd] RFC: Changes to data sources and naming schema

2013-09-23 Thread Florian Forster
Hi Jesse,

On Tue, Sep 24, 2013 at 10:04:20AM +0930, Jesse Reynolds wrote:
 2) - I'm slightly confused ... can you give an example of how you'd do
  a disk utilisation threshold check? either the current absolute
  value needs to be known, or one at some previous point in time
  and the current value calculated from that.

The same way you're doing it now, really.

 It can't all be just rate data.

I think here lies the misunderstanding: The proposal is not to make
everything a rate, but make everything a gauge. For counters, that
means converting them to a rate. Since nothing changes for gauges at
all, I didn't spent any time discussing them. They are *not* going away
and we do not propose to change partition usage or temperatures to
rates.

Best regards,
—octo
-- 
collectd – The system statistics collection daemon
Website: http://collectd.org
Google+: http://collectd.org/+
GitHub:  https://github.com/collectd
Twitter: http://twitter.com/collectd


signature.asc
Description: Digital signature
___
collectd mailing list
collectd@verplant.org
http://mailman.verplant.org/listinfo/collectd