Ah. I also suspect some weird gmetric to cause this, but so far have not been 
able to find it in the XML unfortunately.

Well regardless of the cause, I think it should not cause the interactive port 
to stop responding and for the web interface to hang.

Having a quick look at the source of gmetad I was not able to find where this 
might originate. Perhaps the web interface could fail back to port 8651 if port 
8652 times out.

- Ramon

P.S. pbs-python still alive and well. If you mean "Job Monarch" I have been 
working hard recently on a new release and it is near (99%) finished. ;) 
pbswebmon is a completely different project which SARA is not associated with 
or has any role in.


As of January 2013, SARA has a new name: SURFsara.

ing. Ramon Bastiaans - Senior Systems Programmer - Cluster Computing
| Operations, Support & Development | SURFsara | Science Park 140 | 1098 XG 
Amsterdam | T +31 (0)20 592 30 00 | ramon.bastia...@surfsara.nl | 
www.surfsara.nl |




On 4 apr. 2013, at 18:52, Chris Hunter <chris.hun...@yale.edu> wrote:

> Hi,
> 
> We have seen this before (ganglia-gmond 3.2) when there are whitespace 
> or non-alphanumeric characters in custom gmetrics.
> 
> PS I hope pbs-python/pbswebmon are still active...
> 
> 
>> Hi,
>> 
>> We have been experiencing a weird issue with gmetad.
>> 
>> I am running gmetad v3.4.0
>> 
>> Once in a while now a XML error seems to occur. Like this:
>> 
>> /usr/sbin/gmetad[12241]: Process XML (LISA Cluster): XML_ParseBuffer() error 
>> at line 525626:
>> 
>> Besides what is causing that and why, this causing the Ganglia web front end 
>> to hang and become non responsive.
>> 
>> After checking the gmetad it seems port 8652 is no longer responding to 
>> queries. This does nothing:
>> 
>> # telnet localhost 8652
>> Trying 127.0.0.1...
>> Connected to localhost.
>> Escape character is '^]'.
>> /LISA Cluster
>> 
>> <after about 1 minute>
>> Connection closed by foreign host.
>> 
>> 
>> However port 8651 still works:
>> 
>> # telnet localhost 8651 | wc -l
>> Connection closed by foreign host.
>> 921410
>> 
>> And when I switch the web frontend from port 8652 back to port 8651 
>> ($conf['ganglia_port'] = 8651;), the web page responds and works again.
>> 
>> After restarting gmetad port 8652 also becomes responsive again. It almost 
>> seems gmetad has a thread lost it's way or something.
>> 
>> Any idea what may be causing this (besides the XML error)? It seems weird to 
>> me if 1 port works and the other does not anymore. It might be a bug.
>> 
>> I have a dump of the XML (from port 8651 before restarting) available for 
>> who might want it, but it is 42 MB.
>> 
>> 
>> Kind regards,
>> - Ramon.
>> 
>> As of January 2013, SARA has a new name: SURFsara.
>> 
>> ing. Ramon Bastiaans - Senior Systems Programmer - Cluster Computing
>> | Operations, Support & Development | SURFsara | Science Park 140 | 1098 XG 
>> Amsterdam | T +31 (0)20 592 30 00 | ramon.bastia...@surfsara.nl | 
>> www.surfsara.nl |
> =
> 
> ------------------------------------------------------------------------------
> Minimize network downtime and maximize team effectiveness.
> Reduce network management and security costs.Learn how to hire 
> the most talented Cisco Certified professionals. Visit the 
> Employer Resources Portal
> http://www.cisco.com/web/learning/employer_resources/index.html
> _______________________________________________
> Ganglia-developers mailing list
> Ganglia-developers@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Attachment: smime.p7s
Description: S/MIME cryptographic signature

------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to