Hi, Sam -
We've got a similar deployment (EC2 instances unicasting to a per-AZ
gmetad) that we're managing with Puppet, and I can't say we've seen
anything like that.
How are you automating your redeployments and gmond configurations? Could
your gmond instances be starting up before their unicast configurations
have been applied? If you had some sort of race condition where gmond could
be installed and started, and *then *getting the conf file written, I'd
expect gmond to merrily chug along, fruitlessly trying to multicast into
the void.
Good luck!
On Wed, Nov 12, 2014 at 2:41 PM, Sam Barham <[email protected]>
wrote:
> We've got about 100 machines running on AWS EC2s, with Ganglia for
> monitoring. Because we are on Amazon, we can't use multicast, so the
> architecture we have is each cluster has a Bastion machine, and each other
> machine in the cluster has gmond send its' data to the bastion, which
> gmetad then queries. All standard and sensible and it works just fine.
>
> Except that occasionally, when I redeploy the machines in a cluster (but
> not the bastion - that stays running through this operation), just one of
> the machines will not send data through to the bastion or something. All I
> can say for sure is that gmond is running OK on the problem machine, there
> are no error logs on the problem machine, the bastion or the gmetad
> machine, but the machine doesn't appear in gmetad. If I go into the
> problem machine and restart gmond, it reconnects just fine and appears in
> gmetad.
>
> Which machine has the error is random - it's not a particular type of
> machine or anything. Because the error only shows up rarely, and only at
> deployment time, I can't really turn on debug_level to investigate.
>
> Also, some of the configuration values in gmond.conf are filled in when
> the userdata is run. I've edited /etc/init.d/ganglia-monitor so that it
> starts up immediately after the userdata has run, just in case that matters.
>
> Any ideas?
>
> Sam
>
>
> ------------------------------------------------------------------------------
> Comprehensive Server Monitoring with Site24x7.
> Monitor 10 servers for $9/Month.
> Get alerted through email, SMS, voice calls or mobile push notifications.
> Take corrective actions from your mobile device.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk
> _______________________________________________
> Ganglia-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
>
>
--
[image: logo] <http://www.marketlive.com/>
Joe Gracyk | *DevOps Developer*
707-780-1848 | [email protected]
[image: Follow us on Facebook] <http://www.facebook.com/marketlive>
<https://twitter.com/marketliveinc>
<http://www.linkedin.com/company/marketlive>
<http://www.marketlive-blog.com/> <http://www.marketlive.com/summit2015/>
------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general