Nicholas Satterly <nfsatterly <at> gmail.com> writes: > > [1] https://github.com/ganglia/monitor-core/compare/master...feature/cloud >
Nick, Thanks for your work in implementing this feature. I'm in the same boat with a larg(ish) EC2 (VPC) deployment and sorely missing ganglia in this new environment. I've found and fixed one bug pertaining to localtime versus GMT in the EC2 apr request: https://github.com/ganglia/monitor-core/pull/112 Amazon expects all timestamps to be in GMT. Some of my hosts have non-GMT set localtimes (don't ask). Now I'm facing a consistent sefgfault when the number of nodes in the cluster is large (>= 17). The error looks like: [discovery.ec2] Found 17 matching instances [discovery.ec2] adding i-10ad3c25, udp send channel private_ip 10.10.1.211:8649 [discovery.ec2] adding i-34296506, udp send channel private_ip 10.10.1.204:8649 [discovery.ec2] adding i-1894ff2a, udp send channel private_ip 10.10.1.240:8649 [discovery.ec2] adding i-1a94ff28, udp send channel private_ip 10.10.1.241:8649 [discovery.ec2] adding i-cc99f2fe, udp send channel private_ip 10.10.1.214:8649 [discovery.ec2] adding i-c81c8dfd, udp send channel private_ip 10.10.2.115:8649 [discovery.ec2] adding i-a2d36990, udp send channel private_ip 10.10.1.116:8649 [discovery.ec2] adding i-24235016, udp send channel private_ip 10.10.1.234:8649 [discovery.ec2] adding i-2401bc11, udp send channel private_ip 10.10.2.216:8649 [discovery.ec2] adding i-2a235018, udp send channel private_ip 10.10.1.235:8649 [discovery.ec2] adding i-3a01bc0f, udp send channel private_ip 10.10.2.217:8649 [discovery.ec2] adding i-3801bc0d, udp send channel private_ip 10.10.2.218:8649 [discovery.ec2] adding i-d27015e7, udp send channel private_ip 10.10.2.164:8649 [discovery.ec2] adding i-2823501a, udp send channel private_ip 10.10.1.238:8649 [discovery.ec2] adding i-3a07620f, udp send channel private_ip 10.10.2.177:8649 [discovery.ec2] adding i-422a4f77, udp send channel private_ip 10.10.2.64:8649 [discovery.ec2] adding i-3890f10a, udp send channel private_ip 10.10.1.102:8649 . . . [discovery.ec2] Refreshing node list... [discovery.cloud] access key=AKIAJNY4GBUKJRXY4JDA, secret key=************************************DxvJ [discovery.ec2] using host_type [private_ip], tags [environment= TEST], groups [], availability_zones [] [discovery.ec2] using endpoint ec2.us-west-2.amazonaws.com -> ec2.us-west-2.amazonaws.com [discovery.ec2] URL-encoded API request ec2.us-west-2.amazonaws.com? AWSAccessKeyId=AKIAJNY4GBUKJRXY4JDA&Action=DescribeInstances&Filter.1.Name= instance-state- name&Filter.1.Value=running&Filter.2.Name=tag%3Aenvironment&Filter.2.Value= TEST&SignatureMet hod=HmacSHA256&SignatureVersion=2&Timestamp=2013-06- 17T22%3A41%3A39Z&Version=2012-08- 15&Signature= O7qmbgbbZnMk8njNQiEo4YLlDIVhM9NAF4171NoMTj4%3D [discovery.ec2] HTTP response code 200, 99664 bytes retrieved Segmentation fault The crash is reproducible, happens in about 2 minutes after start and can be avoided by renaming one of the hosts environment= tags to remove it from the cluster. I haven't been able to come up with a fix for this issue but I'm sufficiently out of my depth at this point to ask for help. Thanks. -D ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers