Re: [Ganglia-developers] Historical Data

Rich Paul Thu, 17 Apr 2008 13:02:42 -0700

Witham, Timothy D wrote:
>> Please have a look at this patch, perhaps it'll help with your
> endeavor:
>> http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=176
> 
It does look interesting.
> I really like that patch too.  :-)
> 
>> On Wed, Apr 16, 2008 at 2:01 PM, Rich Paul <[EMAIL PROTECTED]>
> wrote:
>>> I've been hacking on ganglia, to add the ability to access highly
>>>  granular historical data.  This consists of a new php page, which
> shows
>>>  a graph for 1 attribute on 1 host at the time of your choice, a
> script
>>>  which runs as a cronjob in order to copy data from
>>>  /var/lib/ganglia/rrds/**/*.rrd to /var/lib/ganglia/hist/**/*.rrd, (I
>>>  found that when saving a month of data at 4 samples per minute, my
>>>  system was spending much time waiting for IO), and some hacks to
>>>  graph.php.  Has anybody else played with the ability to look at
>>>  arbitrary hours rather than just the most recent hour?
> 
> I don't have space for it since my grids are too huge, but it would be
> easier to just keep more detail in the RRDs which is decided at create
> time.  I haven't yet tried it, but gmetad/conf.c implies that the data
> retention policy could be changed in the config file (I don't see this
> option in the man page though; is that a bug?):
> 
>    config->RRAs[0] = "RRA:AVERAGE:0.5:1:244";
>    config->RRAs[1] = "RRA:AVERAGE:0.5:24:244";
>    config->RRAs[2] = "RRA:AVERAGE:0.5:168:244";
>    config->RRAs[3] = "RRA:AVERAGE:0.5:672:244";
>    config->RRAs[4] = "RRA:AVERAGE:0.5:5760:374";
> 
> Basically, you would just want to crank up that 244 number for the first
> line or two.  See rrdcreate(1) for details.  This would then store the
> detail you want at the cost of increased RRD file size.  I have been
> thinking of doing the opposite: adding another line for less detailed
> averages beyond a year.
> 
> But maybe your parenthetical comment means you did that already but had
> too much waitIO?  And that's why you went to a cron job?  If so, you are
> storing the RRDs in tmpfs, right?
I was having a problem with too much waitIO, which is why I switched to 
batch processing to move half-hour chunks from rrds/**/metric.rrd to 
hist/**/metric.rrd.


I don't store the stuff in tmpfs, for a couple reasons.  1 is that I 
want to lose as little data as possible in the event of a system failure 
or reboot.  I also don't want to hog the memory on a server which serves 
our development environment in several other capacities.

>>>  Also, I am curious as to how the performance of rrdtool would be
>>>  affected if we were to store related metrics in a single rrd file:
>>>  e.g., we could group cpu_(user,system,idle,wio,nice) in a single
> file,
>>>  which I think would reduce the resource usage of gmetad
> significantly.
> 
> I have wondered that too.  Since RRD is random access, it seems like it
> should be at least as efficient and probably more efficient since there
> would be less files open.  But it would be difficult to change.  Now
> each RRD is simple with DS:sum and DS:num for summaries; the metric is
> in the filename itself.  To change, you would need to put the metric
> names in the RRDs: DS:cpu_user_sum, etc. and I think you would have to
> update all metrics with one rrd_update call.  Of course this would work
> only for the standard metrics and extra metrics would still need to be
> in their own files.  Or, perhaps with the new metric groupings, each
> group could be an RRD file of related metrics.  And then you'd have to
> change the PHP to understand all this...
I think you're right on these points.  Probably the only metrics for 
which I would be interested in doing this would be the 3 load metrics, 
5ish cpu metrics, the 5ish memory metrics, and the 4ish network metrics. 
  As a matter of taste, I probably would only group metrics which were 
in common units (except for the network metrics).

I'm not sure how it would effect the speed, because I don't know if 
rrdtool stores multiple ds files like parallel arrays, or like an array 
of structs.  In the former case, updating a row of 5 values would 
probably dirty 5 sectors in the buffer cache, it there would probably be 
little gain.  In the latter case, I suspect one could usually update 5 
values while only dirtying 1 sector in the buffer cache, and you would 
have a pretty good win.

> 
> I would guess it was designed with 1 metric in each RRD file to make it
> more flexible in adding/removing metrics, and to make the code simpler.
> 
> -twitham
> 

I suspect so as well.  Changing it would be (AFAICT) pretty simple in 
the php's (mostly only changing graph.php), I'm not sure how simple it 
would be in gmetad and gmond.  I think the hardest part would be 
convincing gmond to batch grouped metrics into a single message, so that 
gmetad is not left receiving metrics one at a time and caching them 
until it collects the whole set.  Then again, I played with the .x file 
to write a proxy for gmond, and it seemed like a pretty easy config to 
hack upon.


-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Historical Data

Reply via email to