ramon-
looking in protocol.x (for the XDR message format of a gmetric
message) you'll find...
struct Ganglia_gmetric_message {
string type<>;
string name<>;
string value<>;
string units<>;
unsigned int slope;
unsigned int tmax;
unsigned int dmax;
};
which means that the XDR stub code can take any arbitrary length
value (<>).
however, when you look in libgmond.c at the method
Ganglia_gmetric_send() you'll find
int
Ganglia_gmetric_send( Ganglia_gmetric gmetric,
Ganglia_udp_send_channels send_channels )
{
int len;
XDR x;
char gmetricmsg[1500];
Ganglia_message msg;
msg.id = 0;
memcpy( &(msg.Ganglia_message_u.gmetric), gmetric->msg, sizeof
(Ganglia_gmetric_message));
/* Send the message */
xdrmem_create(&x, gmetricmsg, 1500, XDR_ENCODE);
xdr_Ganglia_message(&x, &msg);
len = xdr_getpos(&x);
return Ganglia_udp_send_message( send_channels, gmetricmsg, len);
}
that is soooo ugly. it means that i'm xdr encoding the message into
a 1500 byte buffer regardless of the real size of the message. the
return value from the xdr_Ganglia_message() call is not being checked
so we're not detecting when the buffer gets full and we still have
data to write. this means that the gmetric message will be sent and
the units, slope, tmax, dmax may be missing and the value string
truncated. :(
to make things worse, gmond will passively save this goofed up
message in a hash until xml is pulled and then xdr decode the message
(this is done for efficiency since we may get hundreds of updates
before we query gmond).
the fix is to alter the protocol.x file description to be ...
struct Ganglia_gmetric_message {
string type[12];
/* we have encoded 16 bytes to this point... */
string name[32];
/* we have encoded 52 bytes to this point... */
string value[1416];
/* we have encoded 1472 bytes to this point... */
string units[12];
/* we have encoded 1488 bytes to this point.. */
unsigned int slope;
/* we have encoded 1492 bytes to this point... */
unsigned int tmax;
/* we have encoded 1496 bytes to this point.. */
unsigned int dmax;
/* we have encoded 1500 bytes to this point.. */
};
altering the protocol definition will not effect backward
compatibility or change the network format at all, it will just
enforce rules on the size of each attribute of the gmetric message.
this change will have an effect on how we marshal and unmarshall
gmetric message though. gmetric and gmond will need to be patched up
to deal with the new gmetric message structure.
i don't have a lot of time to make this change but at least this is
a start. if i find some time, i'll try to patch this up but no
promises right now.
-matt
On Mar 2, 2006, at 8:19 AM, Ramon Bastiaans wrote:
Hi all,
Here we use a tool which reports some extra statistics through
gmetric (job information).
I have always assumed the following for the maximum of a gmetric
message, which I found in the ChangeLog:
2002-08-30 22:17 sacerdoti
* lib/ganglia.h (1.2): Maximum multicast message length is 1500
bytes, the size of an ethernet frame.
2002-08-23 22:37 sacerdoti
* gmetric/: cmdline.c (1.3), cmdline.h (1.3), gmetric.c
(1.7): Now
you can send gmetrics with up to 1400 characters in the value
field.
However, that does not seem correct, a gmetric's value can't be
1400 characters.
I have a gmetric here with total string size (including the XML
tags and stuff) of 1030 characters, which gets wrapped:
>>> bla_str
'<METRIC NAME="TOGA-JOB-6718" VAL="name=ADDA_bs queue=q_parallel
owner=myurkin requested_time=13:00:00 ppn=2 status=R
start_timestamp=1141299521 report ed=1141307950 poll_interval=10
domain=irc.sara.nl nodes=gb-r28n16;gb-r28n16;gb-r28n15;gb-r28n15;gb-
r28n14;gb-r28n14;gb-r28n13;gb-r28n13;gb-r28n12;gb-r28n12 ;gb-
r28n11;gb-r28n11;gb-r28n10;gb-r28n10;gb-r28n9;gb-r28n9;gb-r28n8;gb-
r28n8;gb-r28n7;gb-r28n7;gb-r28n6;gb-r28n6;gb-r28n5;gb-r28n5;gb-
r28n4;gb-r28n4;gb-r28 n3;gb-r28n3;gb-r28n2;gb-r28n2;gb-r28n1;gb-
r28n1;gb-r27n20;gb-r27n20;gb-r27n19;gb-r27n19;gb-r27n18;gb-
r27n18;gb-r27n17;gb-r27n17;gb-r27n16;gb-r27n16;gb-r27n15;gb-
r27n15;gb-r27n14;gb-r27n14;gb-r27n13;gb-r27n13;gb-r27n12;gb-
r27n12;gb-r27n11;gb-r27n11;gb-r27n10;gb-r27n10;gb-r27n9;gb-r27n9;gb-
r27n8;gb-r27n8;gb-r27 n7;gb-r27n7;gb-r27n6;gb-r27n6;gb-r27n5;gb-
r27n5;gb-r27n4;gb-r27n4;gb-r27n3;gb-r27n3;gb-r27n2;gb-r27n2;gb-
r27n1;gb-r27n1;gb-r26n20;gb-r26n20;gb-r26n19;gb-r2 6n19;gb-
r26n18;gb-r26n18;gb-r26n17;gb-r26n17;gb-r26n16;gb-r26n16;gb-
r26n15;gb-r26n15;gb-r26n14;gb-\n'
>>> len( bla_str )
1030
>>>
So I was checking from my tool for a length of 1400 characters, but
I actually should be checking for a value length of around 900
characters I think.
I probably shouldn't have assumed a old ChangeLog entry from 2002
was still accurate and it probably changed when APR was introduced
(you know what they say about assumptions ;)), but I'd still like
to know nevertheless.
So now is my question, what is the real maximum length for a
gmetric (value)?
And perhaps we should incorporate better length/error checking in
gmetric, because this wrapped/broken gmetric from above breaks my
entire XML stream for the cluster:
Mar 2 16:45:17 ganglia /usr/sbin/gmetad[29108]: Process XML (LISA
Cluster): XML_ParseBuffer() error at line 15191: unclosed token
If I have the time I will try to write a patch myself, but it seems
buried deep down in the code somewhere and someone else might be
able to spot/fix it faster than me.
Kind regards,
- Ramon.
--
ing. R. Bastiaans HPC - Systems Programmer
SARA - Computing and Networking Services
Kruislaan 415 PO Box 194613
1098 SJ Amsterdam 1090 GP Amsterdam
Tel. +31 (0) 20 592 3000 Fax. +31 (0) 20 668 3167
---
There are really only three types of people:
Those who make things happen, those who watch things happen
and those who say, "What happened?"
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting
language
that extends applications into web and mobile media. Attend the
live webcast
and join the prime developer group breaking into this new coding
territory!
http://sel.as-us.falkag.net/sel?
cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers
--
[EMAIL PROTECTED]
http://massie.us