Ah. I also suspect some weird gmetric to cause this, but so far have not been able to find it in the XML unfortunately.
Well regardless of the cause, I think it should not cause the interactive port to stop responding and for the web interface to hang. Having a quick look at the source of gmetad I was not able to find where this might originate. Perhaps the web interface could fail back to port 8651 if port 8652 times out. - Ramon P.S. pbs-python still alive and well. If you mean "Job Monarch" I have been working hard recently on a new release and it is near (99%) finished. ;) pbswebmon is a completely different project which SARA is not associated with or has any role in. As of January 2013, SARA has a new name: SURFsara. ing. Ramon Bastiaans - Senior Systems Programmer - Cluster Computing | Operations, Support & Development | SURFsara | Science Park 140 | 1098 XG Amsterdam | T +31 (0)20 592 30 00 | ramon.bastia...@surfsara.nl | www.surfsara.nl | On 4 apr. 2013, at 18:52, Chris Hunter <chris.hun...@yale.edu> wrote: > Hi, > > We have seen this before (ganglia-gmond 3.2) when there are whitespace > or non-alphanumeric characters in custom gmetrics. > > PS I hope pbs-python/pbswebmon are still active... > > >> Hi, >> >> We have been experiencing a weird issue with gmetad. >> >> I am running gmetad v3.4.0 >> >> Once in a while now a XML error seems to occur. Like this: >> >> /usr/sbin/gmetad[12241]: Process XML (LISA Cluster): XML_ParseBuffer() error >> at line 525626: >> >> Besides what is causing that and why, this causing the Ganglia web front end >> to hang and become non responsive. >> >> After checking the gmetad it seems port 8652 is no longer responding to >> queries. This does nothing: >> >> # telnet localhost 8652 >> Trying 127.0.0.1... >> Connected to localhost. >> Escape character is '^]'. >> /LISA Cluster >> >> <after about 1 minute> >> Connection closed by foreign host. >> >> >> However port 8651 still works: >> >> # telnet localhost 8651 | wc -l >> Connection closed by foreign host. >> 921410 >> >> And when I switch the web frontend from port 8652 back to port 8651 >> ($conf['ganglia_port'] = 8651;), the web page responds and works again. >> >> After restarting gmetad port 8652 also becomes responsive again. It almost >> seems gmetad has a thread lost it's way or something. >> >> Any idea what may be causing this (besides the XML error)? It seems weird to >> me if 1 port works and the other does not anymore. It might be a bug. >> >> I have a dump of the XML (from port 8651 before restarting) available for >> who might want it, but it is 42 MB. >> >> >> Kind regards, >> - Ramon. >> >> As of January 2013, SARA has a new name: SURFsara. >> >> ing. Ramon Bastiaans - Senior Systems Programmer - Cluster Computing >> | Operations, Support & Development | SURFsara | Science Park 140 | 1098 XG >> Amsterdam | T +31 (0)20 592 30 00 | ramon.bastia...@surfsara.nl | >> www.surfsara.nl | > = > > ------------------------------------------------------------------------------ > Minimize network downtime and maximize team effectiveness. > Reduce network management and security costs.Learn how to hire > the most talented Cisco Certified professionals. Visit the > Employer Resources Portal > http://www.cisco.com/web/learning/employer_resources/index.html > _______________________________________________ > Ganglia-developers mailing list > Ganglia-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/ganglia-developers
smime.p7s
Description: S/MIME cryptographic signature
------------------------------------------------------------------------------ Minimize network downtime and maximize team effectiveness. Reduce network management and security costs.Learn how to hire the most talented Cisco Certified professionals. Visit the Employer Resources Portal http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers