Re: [Pacemaker] different behavior cibadmin -Ql with cman and corosync2

Andrew Beekhof Thu, 05 Sep 2013 03:37:58 -0700

On 05/09/2013, at 6:37 PM, Christine Caulfield <ccaul...@redhat.com> wrote:


> On 03/09/13 22:03, Andrew Beekhof wrote:
>> 
>> On 03/09/2013, at 11:49 PM, Christine Caulfield <ccaul...@redhat.com> wrote:
>> 
>>> On 03/09/13 05:20, Andrew Beekhof wrote:
>>>> 
>>>> On 02/09/2013, at 5:27 PM, Andrey Groshev <gre...@yandex.ru> wrote:
>>>> 
>>>>> 
>>>>> 
>>>>> 30.08.2013, 07:18, "Andrew Beekhof" <and...@beekhof.net>:
>>>>>> On 29/08/2013, at 7:31 PM, Andrey Groshev <gre...@yandex.ru> wrote:
>>>>>> 
>>>>>>>  29.08.2013, 12:25, "Andrey Groshev" <gre...@yandex.ru>:
>>>>>>>>  29.08.2013, 02:55, "Andrew Beekhof" <and...@beekhof.net>:
>>>>>>>>>   On 28/08/2013, at 5:38 PM, Andrey Groshev <gre...@yandex.ru> wrote:
>>>>>>>>>>    28.08.2013, 04:06, "Andrew Beekhof" <and...@beekhof.net>:
>>>>>>>>>>>    On 27/08/2013, at 1:13 PM, Andrey Groshev <gre...@yandex.ru> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>>     27.08.2013, 05:39, "Andrew Beekhof" <and...@beekhof.net>:
>>>>>>>>>>>>>     On 26/08/2013, at 3:09 PM, Andrey Groshev <gre...@yandex.ru> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>      26.08.2013, 03:34, "Andrew Beekhof" <and...@beekhof.net>:
>>>>>>>>>>>>>>>      On 23/08/2013, at 9:39 PM, Andrey Groshev 
>>>>>>>>>>>>>>> <gre...@yandex.ru> wrote:
>>>>>>>>>>>>>>>>       Hello,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>       Today I try remake my test cluster from cman to 
>>>>>>>>>>>>>>>> corosync2.
>>>>>>>>>>>>>>>>       I drew attention to the following:
>>>>>>>>>>>>>>>>       If I reset cluster with cman through cibadmin --erase 
>>>>>>>>>>>>>>>> --force
>>>>>>>>>>>>>>>>       In cib is still there exist names of nodes.
>>>>>>>>>>>>>>>      Yes, the cluster puts back entries for all the nodes it 
>>>>>>>>>>>>>>> know about automagically.
>>>>>>>>>>>>>>>>       cibadmin -Ql
>>>>>>>>>>>>>>>>       .....
>>>>>>>>>>>>>>>>          <nodes>
>>>>>>>>>>>>>>>>            <node id="dev-cluster2-node2.unix.tensor.ru" 
>>>>>>>>>>>>>>>> uname="dev-cluster2-node2"/>
>>>>>>>>>>>>>>>>            <node id="dev-cluster2-node4.unix.tensor.ru" 
>>>>>>>>>>>>>>>> uname="dev-cluster2-node4"/>
>>>>>>>>>>>>>>>>            <node id="dev-cluster2-node3.unix.tensor.ru" 
>>>>>>>>>>>>>>>> uname="dev-cluster2-node3"/>
>>>>>>>>>>>>>>>>          </nodes>
>>>>>>>>>>>>>>>>       ....
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>       Even if cman and pacemaker running only one node.
>>>>>>>>>>>>>>>      I'm assuming all three are configured in cluster.conf?
>>>>>>>>>>>>>>      Yes, there exist list nodes.
>>>>>>>>>>>>>>>>       And if I do too on cluster with corosync2
>>>>>>>>>>>>>>>>       I see only names of nodes which run corosync and 
>>>>>>>>>>>>>>>> pacemaker.
>>>>>>>>>>>>>>>      Since you're not included your config, I can only guess 
>>>>>>>>>>>>>>> that your corosync.conf does not have a nodelist.
>>>>>>>>>>>>>>>      If it did, you should get the same behaviour.
>>>>>>>>>>>>>>      I try and expected_node and nodelist.
>>>>>>>>>>>>>     And it didn't work? What version of pacemaker?
>>>>>>>>>>>>     It does not work as I expected.
>>>>>>>>>>>    Thats because you've used IP addresses in the node list.
>>>>>>>>>>>    ie.
>>>>>>>>>>> 
>>>>>>>>>>>    node {
>>>>>>>>>>>      ring0_addr: 10.76.157.17
>>>>>>>>>>>    }
>>>>>>>>>>> 
>>>>>>>>>>>    try including the node name as well, eg.
>>>>>>>>>>> 
>>>>>>>>>>>    node {
>>>>>>>>>>>      name: dev-cluster2-node2
>>>>>>>>>>>      ring0_addr: 10.76.157.17
>>>>>>>>>>>    }
>>>>>>>>>>    The same thing.
>>>>>>>>>   I don't know what to say.  I tested it here yesterday and it worked 
>>>>>>>>> as expected.
>>>>>>>>  I found that the reason that You and I have different results - I did 
>>>>>>>> not have reverse DNS zone for these nodes.
>>>>>>>>  I know what it should be, but (PACEMAKER + CMAN) worked without a 
>>>>>>>> reverse area!
>>>>>>>  Hasty. Deleted all. Reinstalled. Configured. Not working again. Damn!
>>>>>> 
>>>>>> It would have surprised me... pacemaker 1.1.11 doesn't do any dns 
>>>>>> lookups - reverse or otherwise.
>>>>>> Can you set
>>>>>> 
>>>>>>  PCMK_trace_files=corosync.c
>>>>>> 
>>>>>> in your environment and retest?
>>>>>> 
>>>>>> On RHEL6 that means putting the following in /etc/sysconfig/pacemaker
>>>>>>   export PCMK_trace_files=corosync.c
>>>>>> 
>>>>>> It should produce additional logging[1] that will help diagnose the 
>>>>>> issue.
>>>>>> 
>>>>>> [1] http://blog.clusterlabs.org/blog/2013/pacemaker-logging/
>>>>>> 
>>>>> 
>>>>> Hello, Andrew.
>>>>> 
>>>>> You are a little misunderstood me.
>>>> 
>>>> No, I understood you fine.
>>>> 
>>>>> I wrote that I rushed to judgment.
>>>>> After I did the reverse DNS zone, the cluster behaved correctly.
>>>>> BUT after I took apart the cluster dropped configs and restarted on the 
>>>>> new cluster,
>>>>> cluster again don't showed all the nodes in the nodes (only node with 
>>>>> running pacemaker).
>>>>> 
>>>>> A small portion of the log. Full log
>>>>> In which (I thought) there is something interesting.
>>>>> 
>>>>> Aug 30 12:31:11 [9986] dev-cluster2-node4        cib: (  corosync.c:423   
>>>>> )   trace: check_message_sanity:      Verfied message 4: (dest=<all>:cib, 
>>>>> from=dev-cluster2-node4:cib.9986, compressed=0, size=1551, total=2143)
>>>>> Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (  corosync.c:96    
>>>>> )   trace: corosync_node_name:        Checking 172793107 vs 0 from 
>>>>> nodelist.node.0.nodeid
>>>>> Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (      ipcc.c:378   
>>>>> )   debug: qb_ipcc_disconnect:        qb_ipcc_disconnect()
>>>>> Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (ringbuffer.c:294   
>>>>> )   debug: qb_rb_close:       Closing ringbuffer: 
>>>>> /dev/shm/qb-cmap-request-9616-9989-27-header
>>>>> Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (ringbuffer.c:294   
>>>>> )   debug: qb_rb_close:       Closing ringbuffer: 
>>>>> /dev/shm/qb-cmap-response-9616-9989-27-header
>>>>> Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (ringbuffer.c:294   
>>>>> )   debug: qb_rb_close:       Closing ringbuffer: 
>>>>> /dev/shm/qb-cmap-event-9616-9989-27-header
>>>>> Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (  corosync.c:134   
>>>>> )  notice: corosync_node_name:        Unable to get node name for nodeid 
>>>>> 172793107
>>>> 
>>>> I wonder if you need to be including the nodeid too. ie.
>>>> 
>>>> node {
>>>>  name: dev-cluster2-node2
>>>>  ring0_addr: 10.76.157.17
>>>>  nodeid: 2
>>>> }
>>>> 
>>>> I _thought_ that was implicit.
>>>> Chrissie: is "nodelist.node.%d.nodeid" always available for corosync2 or 
>>>> only if explicitly defined in the config?
>>>> 
>>> 
>>> 
>>> You do need to specify a nodeid if you don't want corosync to imply it from 
>>> the IP address (or you're using IPv6). corosync won't imply a nodeif from 
>>> the order of the nodes in corosync.conf - that's not reliable enough.
>> 
>> Right, but is that implied nodeid available as "nodelist.node.%d.nodeid"?
>> Andrey's results suggest "no" and I would claim this is not expected/good :)
>> 
> 
> If you want to get the nodeid of the node you are on

No, we're trying to use a known nodeid to look up the other information in the 
node list - such as 'ring0_addr' or 'name'.

> there is both a corosync API call for it - totem_nodeid_get() - or you can 
> get it from votequorum via cmap - runtime.votequorum.this_node_id
> 
> The nodelist.* section of cmap is really meant to reflect what is in 
> corosync.conf and I don't really want to be writing into it. I know there is 
> already nodelist.our_node_pos, but I'm not a fan of that either :P
> 
> Chrissie
> 
> 
>>> Also bear in mind that 0 is not a valid node number :-)
>>> 
>>> Chrissie
>>> 
>> 
>

signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] different behavior cibadmin -Ql with cman and corosync2

Reply via email to