Nicholas Satterly <nfsatterly <at> gmail.com> writes:
> 
> [1] https://github.com/ganglia/monitor-core/compare/master...feature/cloud
> 


Nick,

Thanks for your work in implementing this feature.  I'm in the same boat with a
larg(ish) EC2 (VPC) deployment and sorely missing ganglia in this new
environment.

I've found and fixed one bug pertaining to localtime versus GMT in the EC2 apr
request:

https://github.com/ganglia/monitor-core/pull/112

Amazon expects all timestamps to be in GMT.  Some of my hosts have non-GMT set
localtimes (don't ask).

Now I'm facing a consistent sefgfault when the number of nodes in the cluster is
large (>= 17).

The error looks like:

[discovery.ec2] Found 17 matching instances [discovery.ec2] adding i-10ad3c25,
udp send channel private_ip 10.10.1.211:8649 [discovery.ec2] adding i-34296506,
udp send channel private_ip 10.10.1.204:8649 [discovery.ec2] adding i-1894ff2a,
udp send channel private_ip 10.10.1.240:8649 [discovery.ec2] adding i-1a94ff28,
udp send channel private_ip 10.10.1.241:8649 [discovery.ec2] adding i-cc99f2fe,
udp send channel private_ip 10.10.1.214:8649 [discovery.ec2] adding i-c81c8dfd,
udp send channel private_ip 10.10.2.115:8649 [discovery.ec2] adding i-a2d36990,
udp send channel private_ip 10.10.1.116:8649 [discovery.ec2] adding i-24235016,
udp send channel private_ip 10.10.1.234:8649 [discovery.ec2] adding i-2401bc11,
udp send channel private_ip 10.10.2.216:8649 [discovery.ec2] adding i-2a235018,
udp send channel private_ip 10.10.1.235:8649 [discovery.ec2] adding i-3a01bc0f,
udp send channel private_ip 10.10.2.217:8649 [discovery.ec2] adding i-3801bc0d,
udp send channel private_ip 10.10.2.218:8649 [discovery.ec2] adding i-d27015e7,
udp send channel private_ip 10.10.2.164:8649 [discovery.ec2] adding i-2823501a,
udp send channel private_ip 10.10.1.238:8649 [discovery.ec2] adding i-3a07620f,
udp send channel private_ip 10.10.2.177:8649 [discovery.ec2] adding i-422a4f77,
udp send channel private_ip 10.10.2.64:8649 [discovery.ec2] adding i-3890f10a,
udp send channel private_ip 10.10.1.102:8649 .  .  .

[discovery.ec2] Refreshing node list...  [discovery.cloud] access
key=AKIAJNY4GBUKJRXY4JDA, secret key=************************************DxvJ
[discovery.ec2] using host_type [private_ip], tags [environment= TEST], groups
[], availability_zones [] [discovery.ec2] using endpoint
ec2.us-west-2.amazonaws.com -> ec2.us-west-2.amazonaws.com [discovery.ec2]
URL-encoded API request ec2.us-west-2.amazonaws.com?
AWSAccessKeyId=AKIAJNY4GBUKJRXY4JDA&Action=DescribeInstances&Filter.1.Name=
instance-state-
name&Filter.1.Value=running&Filter.2.Name=tag%3Aenvironment&Filter.2.Value=
TEST&SignatureMet
hod=HmacSHA256&SignatureVersion=2&Timestamp=2013-06-
17T22%3A41%3A39Z&Version=2012-08-
15&Signature=
O7qmbgbbZnMk8njNQiEo4YLlDIVhM9NAF4171NoMTj4%3D [discovery.ec2] HTTP
response code 200, 99664 bytes retrieved Segmentation fault

The crash is reproducible, happens in about 2 minutes after start and can be
avoided by renaming one of the hosts environment= tags to remove it from the
cluster.

I haven't been able to come up with a fix for this issue but I'm sufficiently
out of my depth at this point to ask for help.

Thanks.

-D


------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to