ramon-

looking in protocol.x (for the XDR message format of a gmetric message) you'll find...

struct Ganglia_gmetric_message {
  string type<>;
  string name<>;
  string value<>;
  string units<>;
  unsigned int slope;
  unsigned int tmax;
  unsigned int dmax;
};

which means that the XDR stub code can take any arbitrary length value (<>).

however, when you look in libgmond.c at the method Ganglia_gmetric_send() you'll find

int
Ganglia_gmetric_send( Ganglia_gmetric gmetric, Ganglia_udp_send_channels send_channels )
{
  int len;
  XDR x;
  char gmetricmsg[1500];
  Ganglia_message msg;

  msg.id = 0;
memcpy( &(msg.Ganglia_message_u.gmetric), gmetric->msg, sizeof (Ganglia_gmetric_message));

  /* Send the message */
  xdrmem_create(&x, gmetricmsg, 1500, XDR_ENCODE);
  xdr_Ganglia_message(&x, &msg);
  len = xdr_getpos(&x);
  return Ganglia_udp_send_message( send_channels, gmetricmsg, len);
}

that is soooo ugly. it means that i'm xdr encoding the message into a 1500 byte buffer regardless of the real size of the message. the return value from the xdr_Ganglia_message() call is not being checked so we're not detecting when the buffer gets full and we still have data to write. this means that the gmetric message will be sent and the units, slope, tmax, dmax may be missing and the value string truncated. :(

to make things worse, gmond will passively save this goofed up message in a hash until xml is pulled and then xdr decode the message (this is done for efficiency since we may get hundreds of updates before we query gmond).

the fix is to alter the protocol.x file description to be ...

struct Ganglia_gmetric_message {
  string type[12];
  /* we have encoded 16 bytes to this point... */
  string name[32];
  /* we have encoded 52 bytes to this point... */
  string value[1416];
  /* we have encoded 1472 bytes to this point... */
  string units[12];
  /* we have encoded 1488 bytes to this point.. */
  unsigned int slope;
  /* we have encoded 1492 bytes to this point... */
  unsigned int tmax;
  /* we have encoded 1496 bytes to this point.. */
  unsigned int dmax;
  /* we have encoded 1500 bytes to this point.. */
};

altering the protocol definition will not effect backward compatibility or change the network format at all, it will just enforce rules on the size of each attribute of the gmetric message.

this change will have an effect on how we marshal and unmarshall gmetric message though. gmetric and gmond will need to be patched up to deal with the new gmetric message structure.

i don't have a lot of time to make this change but at least this is a start. if i find some time, i'll try to patch this up but no promises right now.

-matt



On Mar 2, 2006, at 8:19 AM, Ramon Bastiaans wrote:

Hi all,

Here we use a tool which reports some extra statistics through gmetric (job information). I have always assumed the following for the maximum of a gmetric message, which I found in the ChangeLog:

2002-08-30 22:17  sacerdoti

       * lib/ganglia.h (1.2): Maximum multicast message length is 1500
       bytes, the size of an ethernet frame.

2002-08-23 22:37  sacerdoti

* gmetric/: cmdline.c (1.3), cmdline.h (1.3), gmetric.c (1.7): Now
       you can send gmetrics with up to 1400 characters in the value
       field.

However, that does not seem correct, a gmetric's value can't be 1400 characters. I have a gmetric here with total string size (including the XML tags and stuff) of 1030 characters, which gets wrapped:

>>> bla_str
'<METRIC NAME="TOGA-JOB-6718" VAL="name=ADDA_bs queue=q_parallel owner=myurkin requested_time=13:00:00 ppn=2 status=R start_timestamp=1141299521 report ed=1141307950 poll_interval=10 domain=irc.sara.nl nodes=gb-r28n16;gb-r28n16;gb-r28n15;gb-r28n15;gb- r28n14;gb-r28n14;gb-r28n13;gb-r28n13;gb-r28n12;gb-r28n12 ;gb- r28n11;gb-r28n11;gb-r28n10;gb-r28n10;gb-r28n9;gb-r28n9;gb-r28n8;gb- r28n8;gb-r28n7;gb-r28n7;gb-r28n6;gb-r28n6;gb-r28n5;gb-r28n5;gb- r28n4;gb-r28n4;gb-r28 n3;gb-r28n3;gb-r28n2;gb-r28n2;gb-r28n1;gb- r28n1;gb-r27n20;gb-r27n20;gb-r27n19;gb-r27n19;gb-r27n18;gb- r27n18;gb-r27n17;gb-r27n17;gb-r27n16;gb-r27n16;gb-r27n15;gb- r27n15;gb-r27n14;gb-r27n14;gb-r27n13;gb-r27n13;gb-r27n12;gb- r27n12;gb-r27n11;gb-r27n11;gb-r27n10;gb-r27n10;gb-r27n9;gb-r27n9;gb- r27n8;gb-r27n8;gb-r27 n7;gb-r27n7;gb-r27n6;gb-r27n6;gb-r27n5;gb- r27n5;gb-r27n4;gb-r27n4;gb-r27n3;gb-r27n3;gb-r27n2;gb-r27n2;gb- r27n1;gb-r27n1;gb-r26n20;gb-r26n20;gb-r26n19;gb-r2 6n19;gb- r26n18;gb-r26n18;gb-r26n17;gb-r26n17;gb-r26n16;gb-r26n16;gb- r26n15;gb-r26n15;gb-r26n14;gb-\n'
>>> len( bla_str )
1030
>>>

So I was checking from my tool for a length of 1400 characters, but I actually should be checking for a value length of around 900 characters I think. I probably shouldn't have assumed a old ChangeLog entry from 2002 was still accurate and it probably changed when APR was introduced (you know what they say about assumptions ;)), but I'd still like to know nevertheless.

So now is my question, what is the real maximum length for a gmetric (value)?

And perhaps we should incorporate better length/error checking in gmetric, because this wrapped/broken gmetric from above breaks my entire XML stream for the cluster:

Mar 2 16:45:17 ganglia /usr/sbin/gmetad[29108]: Process XML (LISA Cluster): XML_ParseBuffer() error at line 15191: unclosed token

If I have the time I will try to write a patch myself, but it seems buried deep down in the code somewhere and someone else might be able to spot/fix it faster than me.

Kind regards,
- Ramon.

--
ing. R. Bastiaans            HPC - Systems Programmer

SARA - Computing and Networking Services
Kruislaan 415                PO Box 194613
1098 SJ Amsterdam            1090 GP Amsterdam
Tel. +31 (0) 20 592 3000     Fax. +31 (0) 20 668 3167
---
There are really only three types of people:

 Those who make things happen, those who watch things happen
 and those who say, "What happened?"



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel? cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

--
[EMAIL PROTECTED]
  http://massie.us




Reply via email to