That solution would be fine also, I didn't mean to imply that ganglia should be using bind, just meant to offer it as one possible solution. As long as the hostnames that ganglia gets from the IPs are consistently in the fqdn format then it would be better than the current situation which results in a mixture of short and full names. Maybe a check to verify that the hostname lookup returned a fqdn, if it didn't then add the domainname to it.
On the other hand, would relying on DNS really be that bad? Is it any worse than relying on each node to have various config files setup correctly, including /etc/hosts entries. I don't know the answers to these questions myself. Maybe it could be a configure option to build ganglia with DNS bind support or without it (relying on the current method and local configs) or an automatic configure test to see if DNS is available. ~Jason On Thu, 2003-03-20 at 18:49, Steven Wagner wrote: > An amazing variety of things are keyed off an /etc/hosts entry in Solaris > (including network interface configuration, if memory serves). This man > speaks the truth. > > However, I put it to you, world, that if this has only popped up on Sun > boxes, a better solution might be to quietly change solaris.c's method of > determining the hostname. Either changing it to a DNS lookup or by having > it figure out a FQDN by concatenating hostname and the default domain, if > the string it's about to return doesn't contain a dot. > > I really hate to build in a dependence on DNS - it makes everything that > much more "fun" when nameservice isn't available... > > (then again, in the Sun world you usually place all your eggs in the NIS > basket, so what's the difference? hmmm... ) > > Jason A. Smith wrote: > > I noticed a potential problem that results from the use of the > > gethostbyXXX functions. We have a small group of Sun file servers that > > we want to monitor with ganglia. The problem originates from having an > > /etc/hosts file that includes an entry for your local host that has both > > the short and fqdn forms of the hostname defined, like: > > > > xxx.xxx.xxx.xxx host host.domain.name > > > > but no other hosts in your organization. If you have a setup like that, > > and the default or standard config in the appropriate /etc/nsswitch.* > > config file: > > > > hosts: files dns > > > > then a gethostbyXXX lookup will return the short name for your local > > host, but the fqdn for all other hosts in your multicast cluster. This > > is obviously a problem because depending on which host in your cluster > > gmetad queries, you will end up with one short hostname and the rest > > long. Also, if your first host fails to respond to gmetad, then the > > next host might be queried resulting in two previously unknown hostnames > > that will now be displayed and saved to new rrds. I have been told that > > changing the local config is not really an option for us. I an not a > > Sun/Solaris admin, but have been told by our admins here that there was > > some wierd magic that having your /etc/hosts file configured in this way > > solved some strange problems. It may be mostly historical now, but our > > admins are understandably reluctant to change an otherwise working > > config. I am no expert on this, but others have told me that a better > > way to do hostname lookups is to always use DNS lookups (if your > > facility is configured to use DNS) through the bind API and not rely on > > the local unix config and the gethostbyXXX functions. > > > > ~Jason > > > > -- /------------------------------------------------------------------\ | Jason A. Smith Email: [EMAIL PROTECTED] | | Atlas Computing Facility, Bldg. 510M Phone: (631)344-4226 | | Brookhaven National Lab, P.O. Box 5000 Fax: (631)344-7616 | | Upton, NY 11973-5000 | \------------------------------------------------------------------/