Re: [Ganglia-developers] 2.6.0

Matt Massie Mon, 22 Mar 2004 11:33:41 -0800

> not that I'm falling for your bait, but would you care to post your
> perl stuff. Just in case someone has to much free time available :-)


here is a perl function which pulls all the data out of a round-robin
database.  you must have rrdtool installed since this function runs
rrdtool dump to get the info (it's xml).  i went through and commented
up the code a lot so that you can understand all the painful details of
what's going on.  you might run

% rrdtool dump ./test.rrd

to see the output of rrdtool dump (it doesn't modify the rrd in any
way).

----------------------------------------------------------------------

my %data = ();

sub collect_rrd_data
{
  my $rrd = shift;
  my $ref = shift;

  open CMD, "rrdtool dump $rrd|" or 
                   die "Can't pipe rrdtool output: $!\n";
  
  # Loop through the output
  while (<CMD>)
    {
      # Only process lines with data.  This line assumes
      # all data was created on or after jan 1 2000.
      # Since I released ganglia in 2002.. that's a safe bet.
      next if not /<!-- 200/ or /<lastupdate/;

      # Split up the data line
      my @s = split (/\s+/, $_);

      # Skip any data that is undefined "NaN"
      next if ($s[9] eq "NaN");

      # If there are two datasources, this is a summmary database.
      # $s[6] is the timestamp for the data $s[9] is sum $s[11] is num.
      if($s[11])
        {
          $$ref{$s[6]}= "$s[9]:$s[11]";
        }
      else
        {
          $$ref{$s[6]}  = "$s[9]";
        }
    }
  close(CMD);
}

# Collect all the data into an associative array
collect_rrd_data("test.rrd", \%data);

# Print the data
for my $t ( keys(%data))
{
  print "timestamp = $t data = $data{$t}\n";
}
-----------------------------------------------------------------

you can easily modify this function to taste.

rrdtool has a whole perl API which is documented at 
http://www.rrdtool.com/perlbind/RRDs.html

i would imagine an upgrade script looking like this...

1. mv /var/lib/ganglia/rrds /var/lib/ganglia/rrds.backup
2. mkdir /var/lib/ganglia/rrds
3. recurse /var/lib/ganglia/rrds.backup
    a. use RRDs::create(new.rrd, ...) to make a new database
    b. use collect_rrd_data(old.rrd,...) to get all the old data points
    c. use RRDs::update(new.rrd, ...) to write the old data to the 
        new database

here are the create and update calls

RRDs::create( "new.rrd", "-b", "315360000",  "-s", "1",
                   "DS:sum:GAUGE:315360000:U:U",
"RRA:AVERAGE:$xff:15:240",       
"RRA:AVERAGE:$xff:360:240",      
"RRA:AVERAGE:$xff:3600:744",     
"RRA:AVERAGE:$xff:86400:365");

there is a complication.  there are two formats really.. a regular
format (above) and a summary format.  here is the summary format...


RRDs::create( "new.rrd", "-b", "315360000",  "-s", "1",
                   "DS:sum:GAUGE:315360000:U:U",
                   "DS:num:GAUGE:315360000:U:U",
"RRA:AVERAGE:$xff:15:240",       
"RRA:AVERAGE:$xff:360:240",      
"RRA:AVERAGE:$xff:3600:744",     
"RRA:AVERAGE:$xff:86400:365");

you can see the summary format has two datasource: sum and num.  the
reason we need to the num DS is to know how many data points (hosts)
went into the total sum DS.  we can then divide sum by num to get an
average for example.

how do you tell a summary from a regular archive??  look at the code
snippet above.  you'll see that summary archives will have two values
(separated by a ':').  it's trivial to check if you have a summary
database... check for a ':' in any value.

there is one more complication... if a person wants a custom round-robin
archive format.  feel free to punt on that if you like.. we can tell
people.. if you want a custom round-robin archive format.. you will have
to modify this upgrade script before you run it.  we can make them an
array at the top of the script of something.

here is the update call snippet...

-----------------------------------
my $new_rrd = "new.rrd";
my $old_rrd = "old.rrd";

my %old_data = ();

sub check_error
{
  my $ERR=RRDs::error;
  die "Error: $ERR\n" if $ERR;
}

collect_rrd_data( $old_rrd, \%old_data );

# create the new round-robin here (see above)
RRDs::create( $new_rrd, ...);
&check_error();

# we might need to sort the keys.. the timestamps
# will need to be in order from oldest to newest
# or update will fail.
for my $t ( keys(%data))
{
  RRDs::update( $new_rrd, "$t:$old_data{$t}");
  &check_error();
}
--------------------------------------

collect, create, update/insert... pretty easy.  the nice thing with the
collect_rrd_data call is that you don't need to worry if you are writing
into a summary or regular rrdb.  remember the $old_data{$t} will have a
':'-delimited list automatically if necessary.  you might want to make
the error checking work differently and not just die when an error
occurs.  not sure.

i think this will get you (or any other perl monger out there) well on
your way.  

your work is invaluable and will make any rrd transition in the future
go MUCH more easily.  this script will be able to evolve as ganglia
evolves.

one last complication (that i can think of right now).  the filenames
might change... at minimum when recursing we need to treat host Foo.Bar
the same as foo.bar or FOO.BAR.  don't worry too much about that right
now but try to script with that in mind. 

if you need anything else... please let me know.  

good luck and thanks so much for helping!

-matt

p.s. i hope this isn't going to be the last time you volunteer for
something.  :)

-- 
Mobius strippers never show you their back side
PGP fingerprint 'A7C2 3C2F 8445 AD3C 135E  F40B 242A 5984 ACBC 91D3'

signature.asc
Description: This is a digitally signed message part

Re: [Ganglia-developers] 2.6.0

Reply via email to