Mark,

Server disappearing from colmux output on Exadata cluster can be solved by
adding "-o ServerAliveInterval=300" to colmux ssh command. This will ensure
that a message is sent from client (colmux) to server (machines being
monitored) every 300sec over secure encrypted channel (hence not spoofable)
to ensure that ssh connection don't timeout.

I have tested above by adding the in the ssh command variable. You may want
to include that in colmux source code itself.

my $Ssh='/usr/bin/ssh -o StrictHostKeyChecking=no -o ServerAliveInterval=300
';
$Ssh.=" -q"    unless $debug;


Vishal

From:  Vishal Gupta <[email protected]>
Date:  Monday, 25 February 2013 11:49
To:  Vishal Gupta <[email protected]>, Mark Seger <[email protected]>
Cc:  Collectl Interest <[email protected]>
Subject:  Re: [Collectl-interest] colmux duplicating nodes

Mark,

I think my servers disappearing might be due to SSH timeout.

From:  Vishal Gupta <[email protected]>
Date:  Wednesday, 24 October 2012 21:42
To:  Mark Seger <[email protected]>
Cc:  Collectl Interest <[email protected]>
Subject:  Re: [Collectl-interest] colmux duplicating nodes

Mark, 

I don't think my servers disappearing from colmux is due to a network
glitch. On a Exadata, all the servers are connected via a internal Cisco IP
switch. There are also dedicated 3 infiniband switches. I have tried over
both Cisco IP switch and infiniband switch with --age=5 as well 10. But my
servers still disappear from the output after few hours. Is there i can do
to debug this? What level of debug do you recommend for debugging this?

Regards,
Vishal Gupta
Blog <http://blog.vishalgupta.com/>  |  LinkedIn
<http://www.linkedin.com/in/vishalgupta77>  | Twitter
<https://twitter.com/vishalgupta77>

-----Original Message-----
From: Mark Seger <[email protected]>
 

Subject: Re: [Collectl-interest] colmux duplicating nodes
 

Date: 20 October 2012 12:19:08 BST
 

To: Vishal Gupta <[email protected]>
 

Cc: [email protected]
 


 


On Fri, Oct 19, 2012 at 4:16 PM, Vishal Gupta <[email protected]>
wrote:
> I am using colmux on a Oracle Exadata Machine full rack with linux hosts (OEL
> 5.7), if colmux is left running for few hours it starts showing duplicate
> lines for server in the output.

are you using the latest version [3.2.0]?  I do remember seeing that in an
earlier version and I thought I fixed it.  I'm really hoping it's not still
there because it can be pretty painful to track down or even reproduce.  The
way colmux works is it asynchronously receives/stores data from each remote
host and at the same time fires a timer every monitoring interval.  Colmux
then displays the late value it's seen for each entry.   Sounds simple
enough but it turned of the incoming data was occasionally overwriting the
data from the previous samples.  My solution was to double-buffer the data,
reading from one dataset while writing to a new one.  I'm just hoping I
don't need to dig back into it.
 
> Also i noticed that some of the hosts are automatically completely removed
> from the output. Is there some kind of timeout configured in colmux or
> collectl which might remove the server entries from the output over time.

unfortunately the way colmux works is if it doesn't hear from a remote
server in x-seconds (which you can set via --age) it drops it from the list
and doesn't try to reconnect.  as for the age, you don't want to make it too
long or else a server could disconnect and you'd never know it and keep
displaying stale data.  I suppose on a glitchy network you could end up
having to wait a little longer.  Maybe you could try upping it to 5 or 10
and see if that helps OR if the remote machine really did drop the link.

you're not the first to ask about reconnecting when a host drops...

-mark
 
> Regards,
> Vishal Gupta
> Blog <http://blog.vishalgupta.com/>  |  LinkedIn
> <http://www.linkedin.com/in/vishalgupta77>  | Twitter
> <https://twitter.com/vishalgupta77>
> 




------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
Collectl-interest mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/collectl-interest

Reply via email to