I understand that it may be caused by some sort of corrupt XML data, but
what can be causing this corrupt XML to be sent from the gmond daemons
at such an infrequent interval, and why does it cause gmetad to just
die?  Can gmetad handle corrupt xml data when it parses it and just skip
that polling interval, waiting for the next one?  The infrequency of
this really makes it hard to diagnose.  Over the 5,760 polling intervals
in a day, there are only a dozen or two warnings like this from the 10
different data source threads, and it causes gmetad to die only 2-3
times a week.

Maybe this could be a hardware problem, we might try setting up a second
frontend node and see if we have the same problem.

~Jason


On Wed, 2003-09-03 at 17:38, matt massie wrote:
> jason-
> 
> i would check the XML that your data sources are outputting.  it appears 
> it is not valid XML and has some elements with duplicate attributes.
> 
> for example,
> 
> <HOST NAME="foo" NAME="foo"...>
> 
> or something like that.
> 
> -matt
> 
> Today, Jason A. Smith wrote forth saying...
> 
> > From: Jason A. Smith <[EMAIL PROTECTED]>
> > To: Ganglia Developers <ganglia-developers@lists.sourceforge.net>
> > Date: Wed, 03 Sep 2003 17:35:52 -0400
> > Subject: [Ganglia-developers] gmetad crashing.
> > 
> > We recently started having problems with gmetad just dieing
> > unexpectedly, with no explanation is the system log.  The only unusual
> > thing is a few xml parse errors several hours before gmetad dies:
> > 
> > Sep  3 04:50:14 ganglia01 /usr/sbin/gmetad[16437]: Process XML (Cluster
> > 1): XML_ParseBuffer() error at line 5467: not well-formed  
> > Sep  3 07:01:11 ganglia01 /usr/sbin/gmetad[16435]: Process XML (Cluster
> > 2): XML_ParseBuffer() error at line 388: duplicate attribute  
> > Sep  3 07:01:12 ganglia01 /usr/sbin/gmetad[16437]: Process XML (Cluster
> > 1): XML_ParseBuffer() error at line 2383: not well-formed  
> > Sep  3 07:01:25 ganglia01 /usr/sbin/gmetad[16437]: Process XML (Cluster
> > 1): XML_ParseBuffer() error at line 4368: not well-formed  
> > Sep  3 09:46:13 ganglia01 /usr/sbin/gmetad[16429]: Process XML (Cluster
> > 3): XML_ParseBuffer() error at line 569: not well-formed  
> > Sep  3 09:46:13 ganglia01 /usr/sbin/gmetad[16437]: RRD_update
> > (/var/lib/ganglia/rrds/Cluster 1/rcas2100.rcf.bnl.gov/mem_free.rrd):
> > illegal attempt to update using time 1062596767 when last update time is
> > 1062596767 (minimum one second step) 
> > Sep  3 09:46:13 ganglia01 /usr/sbin/gmetad[16437]: Process XML (Cluster
> > 1): XML_ParseBuffer() error at line 1850: not well-formed  
> > Sep  3 12:14:41 ganglia01 /usr/sbin/gmetad[16430]: Process XML (Cluster
> > 4): XML_ParseBuffer() error at line 2: junk after document element  
> > Sep  3 12:34:13 ganglia01 /usr/sbin/gmetad[16436]: Process XML (Cluster
> > 5): XML_ParseBuffer() error at line 1298: not well-formed  
> > Sep  3 13:12:20 ganglia01 /usr/sbin/gmetad[16434]: Process XML (Cluster
> > 6): XML_ParseBuffer() error at line 527: not well-formed  
> > 
> > Then gmetad dies almost an hour later.
> > 
> > Any ideas what the problem could be?  I have tried restarting gmetad
> > with debugging and will wait to see if it happens again.
> > 
> > ~Jason
> > 
> > 
> > -- 
> > /------------------------------------------------------------------\
> > |  Jason A. Smith                          Email:  [EMAIL PROTECTED] |
> > |  Atlas Computing Facility, Bldg. 510M    Phone:  (631)344-4226   |
> > |  Brookhaven National Lab, P.O. Box 5000  Fax:    (631)344-7616   |
> > |  Upton, NY 11973-5000                                            |
> > \------------------------------------------------------------------/
> > 
> > 
> > 
> > -------------------------------------------------------
> > This sf.net email is sponsored by:ThinkGeek
> > Welcome to geek heaven.
> > http://thinkgeek.com/sf
> > _______________________________________________
> > Ganglia-developers mailing list
> > Ganglia-developers@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/ganglia-developers
> > 
-- 
/------------------------------------------------------------------\
|  Jason A. Smith                          Email:  [EMAIL PROTECTED] |
|  Atlas Computing Facility, Bldg. 510M    Phone:  (631)344-4226   |
|  Brookhaven National Lab, P.O. Box 5000  Fax:    (631)344-7616   |
|  Upton, NY 11973-5000                                            |
\------------------------------------------------------------------/


Reply via email to