On Fri, Aug 31, 2012 at 5:55 PM, Vladislav Bogdanov <bub...@hoster-ok.com> wrote: > 31.08.2012 05:43, Andrew Beekhof wrote: >> On Wed, Aug 29, 2012 at 8:57 PM, Vladislav Bogdanov >> <bub...@hoster-ok.com> wrote: >>> 29.08.2012 13:33, Andrew Beekhof wrote: >>>> On Wed, Aug 29, 2012 at 4:22 PM, Vladislav Bogdanov >>>> <bub...@hoster-ok.com> wrote: >>>>> Hi, >>>>> >>>>> It looks like pacemaker (current master) >>>> >>>> "current master" changes quite rapidly, could you be specific? >>> >>> c72f5ca >>> >>>> >>>>> does not always work nicely on >>>>> top of corosync2 if one doesn't have /etc/hosts with all cluster nodes >>>>> in it, where short form of name goes before the long one (so >>>>> gethostbyaddr() and getnameinfo() return the short one). >>>> >>>> I noticed a different issue related to this, but I need to know >>>> exactly which version you had before I can answer properly. >> >> Ok... >> >> Pacemaker doesn't actually care about FQDN vs short names. >> Short names are arguably nicer to look at, but the only thing that >> really matters is that when node A looks up its own name, that the >> answer is consistent with the answer /other/ nodes get when they look >> up node A. >> >> The problem to date, is that local lookups have used uname(3P) while >> remote lookups are using some other method (like getnameinfo(3)) . >> So I think the first step to fixing this mess is to have everyone >> using the same mechanism - for corosync 2.x clusters[1] that will >> almost certainly be the corosync_node_name() function you spotted. >> >> If no nodelist[2] is specified in corosync.conf, we use getnameinfo() >> on the address corosync is bound to - possibly with your amendment >> below. >> If there is a node list, we will look for a name in the 'ring0_addr' >> or 'name' fields >> If those fields are missing or contain IP addresses, we fall back to >> getnameinfo() as per the "no nodelist" case. >> If non of those work, I guess we fall back to uname() and hope for the best. > > That is sane, thank you for explanation. > >> >> >> I'm going to make this the first thing I do after 1.1.8 comes out >> (we're waiting on http://bugs.clusterlabs.org/show_bug.cgi?id=5044 and >> some final CTS runs). > > Btw, to 1.1.8, I spotted two paths (in c72f5ca) where stonithd dumps > core, one sigsegv when doing manual ack and one assert when queuing > remote operation (may be vise versa, can't look right now). > It this under control of CTS?
No. Do please report and segfaults you see ASAP. > > > Vladislav > >> If someone wants to help out before then, I would certainly not complain :) >> >> -- Andrew >> >> [1] We will implement equivalent functions for the other cluster types. >> [2] The nodelist section looks something like: >> nodelist { >> node { >> nodeid: 1 >> ring0_addr: pcmk-1 >> quorum_votes: 1 >> } >> node { >> nodeid: 2 >> ring0_addr: pcmk-2 >> quorum_votes: 2 >> } >> } >> >> >> >>>> >>>>> I tried to run >>>>> test cluster with stub /etc/hosts but fully functional name server, and >>>>> I see that pacemaker includes long nodenames (fqdn) into nodelist, while >>>>> expecting them to be equal to what uname() returns for the local node. >>>>> After I created needed entries in /etc/hosts everything began to work. >>>>> From getaddrinfo manpage, NI_NOFQDN flag should help to avoid this >>>>> behavior. >>> >>> s/getaddrinfo/getnameinfo/ >>> >>> Actually it doesn't. At least not always. >>> Problem is that hostname (nodename) may be either fqdn (like anaconda >>> tries to set) or contain only host part. And getnameinfo() is not >>> consistent here (as in EL6), it strips domainname of a local system with >>> leading dot if local hostname is FQDN, but returns FQDN which >>> corresponds to address being searched if hostname is host-only. >>> >>> So, I tried following patch and it works perfectly for me (hosnames are >>> host-only, and DNS is correctly configured, so hostname -f returns FQDN). >>> >>> diff -urNp a/lib/cluster/corosync.c b/lib/cluster/corosync.c >>> --- a/lib/cluster/corosync.c 2012-08-29 07:32:57.000000000 +0000 >>> +++ b/lib/cluster/corosync.c 2012-08-29 07:33:54.730099738 +0000 >>> @@ -207,7 +207,15 @@ static char *corosync_node_name(cmap_han >>> addrlen = sizeof(struct sockaddr_in); >>> } >>> >>> - if (getnameinfo((struct sockaddr *)addrs[0].address, >>> addrlen, buf, sizeof(buf), NULL, 0, 0) == 0) { >>> + if (getnameinfo((struct sockaddr *)addrs[0].address, >>> addrlen, buf, sizeof(buf), NULL, 0, NI_NAMEREQD) == 0) { >>> + char *p = buf; >>> + while (*p) { >>> + if (*p == '.') { >>> + *p = '\0'; >>> + break; >>> + } >>> + p++; >>> + } >>> crm_notice("Inferred node name '%s' for nodeid %u from >>> DNS", buf, nodeid); >>> >>> if(corosync_name_is_valid("DNS", buf)) { >>> >>> >>> Now I do not see FQDNs in nodelist. >>> Grrr, line wrapping... >>> >>>>> Additionally, NI_NAMEREQD flag should probably be also used. >>> >>> This one still applies. Otherwise getnameinfo can return string >>> representation of IP address if it cannot resolve it. >> >> Thats not a big deal, corosync_name_is_valid() will detect this and >> refuse to use it. >> >>> >>> Btw, NI_MAXHOST should be used instead of INET6_ADDRSTRLEN for buf there. >>> >>> >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org