Mark,
I am using following version.
collectl-3.6.3-2
collectl-utils-3.1.0-1
I am using the older version of collectl-utils, as i had problem with the newer
version. I will install newer collectl-util again to find the exact description
of the problem and let you know why i had not used the newer colmux. If i don't
see the problem i faced earlier, i will try newer version to see if duplicate
server entries are still listed in the output. I will also try the --age flag
to see if "servers-disappearing-in-output" problem goes away with increased
value of --age.
Thanks for the lovely tool and your prompt support. Not only your tool is
fantastic, your support is even better.
Regards,
Vishal Gupta
Blog | LinkedIn | Twitter
-----Original Message-----
From: Mark Seger <[email protected]>
Subject: Re: [Collectl-interest] colmux duplicating nodes
Date: 20 October 2012 12:19:08 BST
To: Vishal Gupta <[email protected]>
Cc: [email protected]
On Fri, Oct 19, 2012 at 4:16 PM, Vishal Gupta <[email protected]> wrote:
I am using colmux on a Oracle Exadata Machine full rack with linux hosts (OEL
5.7), if colmux is left running for few hours it starts showing duplicate lines
for server in the output.
are you using the latest version [3.2.0]? I do remember seeing that in an
earlier version and I thought I fixed it. I'm really hoping it's not still
there because it can be pretty painful to track down or even reproduce. The
way colmux works is it asynchronously receives/stores data from each remote
host and at the same time fires a timer every monitoring interval. Colmux then
displays the late value it's seen for each entry. Sounds simple enough but it
turned of the incoming data was occasionally overwriting the data from the
previous samples. My solution was to double-buffer the data, reading from one
dataset while writing to a new one. I'm just hoping I don't need to dig back
into it.
Also i noticed that some of the hosts are automatically completely removed from
the output. Is there some kind of timeout configured in colmux or collectl
which might remove the server entries from the output over time.
unfortunately the way colmux works is if it doesn't hear from a remote server
in x-seconds (which you can set via --age) it drops it from the list and
doesn't try to reconnect. as for the age, you don't want to make it too long
or else a server could disconnect and you'd never know it and keep displaying
stale data. I suppose on a glitchy network you could end up having to wait a
little longer. Maybe you could try upping it to 5 or 10 and see if that helps
OR if the remote machine really did drop the link.
you're not the first to ask about reconnecting when a host drops...
-mark
Regards,
Vishal Gupta
Blog | LinkedIn | Twitter
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Collectl-interest mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/collectl-interest